在Python中使用请求保存页面内容

2024-09-26 22:09:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在尝试使用“请求”模块访问网站上的.txt文件。当我使用用户名和密码手动登录时,我能够在浏览器中看到真实的数据。你知道吗

Point Code  Issue Date  Trade Date  Region  Pricing Point   Low High    Average Volume  Deals   Delivery Start Date Delivery End Date
RMTNWW  2018-10-09  2018-10-08  Rocky Mountains Northwest Wyoming Pool  2.910   2.955   2.935   323 44  2018-10-09  2018-10-09
RMTOPAL 2018-10-09  2018-10-08  Rocky Mountains Opal    2.925   3.050   2.965   209 40  2018-10-09  2018-10-09

但是当我尝试用脚本访问同一页并用

print(page.content)

输出作为html源:

   b'<!DOCTYPE html>\n<html>\n<head>\n\n<meta name="csrf-param" content="authenticity_token"/>\n<meta name="csrf-token" content="s35g4TAUN6+5V8Xi8x7u6f2FwziX3gbW9iY9D45HnEw="/>\n<meta http-equiv="content-type" content="text/html;charset=utf-8">
\n<meta name="description" content="Natural Gas Intelligence">\n<meta name="keywords" content="gas, natural gas, natural gas prices, enery prices, NYMEX, nymex settlement, aga, storage, natural gas data, henry hub, ferc, power, electricity, electric, megawatt, methane, reliability, inside, ngi">\n\n\n\n<meta content="false" name="has-log-view" />\n<!--<meta content="IE=EmulateIE7" http-equiv="X-UA-Compatible"/>
    .
    .
    .

这个HTML里面没有任何上面显示的标签(点代码,发布日期等),所以我觉得这可能是一个登录问题。登录URL是https://www.naturalgasintel.com/user/login,而文件位于路径https://www.naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/10/20181009td.txt。你知道吗

我的剧本是:

import requests
with requests.Session() as c:
    data_url = 'https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/'
    username = ''
    password = ''
    login_data = dict(username=username, password=password)
    c.post(data_url, data=login_data, headers={'Referer':'https://www.naturalgasintel.com/'})
    page = c.get('https://www.naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/10/20181009td.txt', stream=True)
    print(page.content)

我想使用open函数保存页面的实际.txt内容,而不是html源代码,在该函数中,我可以使用如下方法write将内容保存到文件中:

localfile = 'output_{}.csv'
datafile = open(localfile, "w", encoding="utf-8")
datafile.write(page)
datafile.close()

如何获取这些内容而不是html源代码?你知道吗


Tags: 文件namehttpstxtcomdatadatehtml

热门问题