使用Pandas/Urllib的Python下载文件

from urllib.request import Request, urlopen url='https://www.nseindia.com/content/fo/fo_mktlots.csv' url_request = Request(url, headers={'User-Agent': 'Mozilla/5.0'}) html = urlopen(url_request ).read()

1条回答

网友

1楼 · 发布于 2024-10-01 22:43:03

该网站试图防止内容抓取。在

问题不在于你做错了什么，而在于如何配置web服务器以及它在各种情况下的行为。在

但是要克服刮取保护，创建定义良好的http请求头，最好的方法是发送一组真正的web浏览器所做的http报头。在

在这里，它使用最小集：

>>> myHeaders = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36', 'Referer': 'https://www.nseindia.com', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}
>>> url_request  = Request(url, headers=myHeaders)
>>> html = urlopen(url_request ).read()
>>> len(html)
42864
>>>

可以将urllib传递给pandas：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Pandas/Urllib的Python下载文件

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >