我一直在尝试使用“请求”模块访问网站上的.txt文件。当我使用用户名和密码手动登录时,我能够在浏览器中看到真实的数据。你知道吗
Point Code Issue Date Trade Date Region Pricing Point Low High Average Volume Deals Delivery Start Date Delivery End Date
RMTNWW 2018-10-09 2018-10-08 Rocky Mountains Northwest Wyoming Pool 2.910 2.955 2.935 323 44 2018-10-09 2018-10-09
RMTOPAL 2018-10-09 2018-10-08 Rocky Mountains Opal 2.925 3.050 2.965 209 40 2018-10-09 2018-10-09
但是当我尝试用脚本访问同一页并用
print(page.content)
输出作为html源:
b'<!DOCTYPE html>\n<html>\n<head>\n\n<meta name="csrf-param" content="authenticity_token"/>\n<meta name="csrf-token" content="s35g4TAUN6+5V8Xi8x7u6f2FwziX3gbW9iY9D45HnEw="/>\n<meta http-equiv="content-type" content="text/html;charset=utf-8">
\n<meta name="description" content="Natural Gas Intelligence">\n<meta name="keywords" content="gas, natural gas, natural gas prices, enery prices, NYMEX, nymex settlement, aga, storage, natural gas data, henry hub, ferc, power, electricity, electric, megawatt, methane, reliability, inside, ngi">\n\n\n\n<meta content="false" name="has-log-view" />\n<!--<meta content="IE=EmulateIE7" http-equiv="X-UA-Compatible"/>
.
.
.
这个HTML里面没有任何上面显示的标签(点代码,发布日期等),所以我觉得这可能是一个登录问题。登录URL是https://www.naturalgasintel.com/user/login
,而文件位于路径https://www.naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/10/20181009td.txt
。你知道吗
我的剧本是:
import requests
with requests.Session() as c:
data_url = 'https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/'
username = ''
password = ''
login_data = dict(username=username, password=password)
c.post(data_url, data=login_data, headers={'Referer':'https://www.naturalgasintel.com/'})
page = c.get('https://www.naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/10/20181009td.txt', stream=True)
print(page.content)
我想使用open
函数保存页面的实际.txt内容,而不是html源代码,在该函数中,我可以使用如下方法write
将内容保存到文件中:
localfile = 'output_{}.csv'
datafile = open(localfile, "w", encoding="utf-8")
datafile.write(page)
datafile.close()
如何获取这些内容而不是html源代码?你知道吗
目前没有回答
相关问题 更多 >
编程相关推荐