我正在抓取一个包含以下url和标题的网站:
网址:'https://tennistonic.com/tennis-news/"
标题:
{
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Cache-Control": "no-cache",
"content-length": "0",
"content-type": "text/plain",
"cookie": "IDE=AHWqTUl3YRZ8Od9MzGofphNI-OCOFESmxlN69Ekm4Sbh9tcBDXGJQ1LVwbDd2uX_; DSID=AAO-7r74ByYt6ieW2yasN78hFsOGY6mrhpN5pEOWQ1vGRnAOdolIlKv23JqCRf11OpFUGFdZ-yxB3Ii1VE6UjcK-jny-4mcJ5uO-_BaV3bEFbLvU7rJNBlc",
"origin": "https//tennistonic.com",
"Connection": "keep-alive",
"Pragma": "no-cache",
"Referer": "https://tennistonic.com/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "cross-site",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36",
"x-client-data": "CI22yQEIprbJAQjBtskBCKmdygEIl6zKAQisx8oBCPXHygEI58jKAQjpyMoBCOLNygEI3NXKAQjB18oBCP2XywEIj5nLARiKwcoB"}
x客户机数据之后有一个解码部分,我省略了,但也尝试了。关于开发工具的完整请求如下所示:
:authority: stats.g.doubleclick.net
:method: POST
:path: /j/collect?t=dc&aip=1&_r=3&v=1&_v=j87&tid=UA-13059318-2&cid=1499412700.1601628730&jid=598376897&gjid=243704922&_gid=1691643639.1604317227&_u=QACAAEAAAAAAAC~&z=1736278164
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-GB,en-US;q=0.9,en;q=0.8
cache-control: no-cache
content-length: 0
content-type: text/plain
cookie: IDE=AHWqTUl3YRZ8Od9MzGofphNI-OCOFESmxlN69Ekm4Sbh9tcBDXGJQ1LVwbDd2uX_; DSID=AAO-7r74ByYt6ieW2yasN78hFsOGY6mrhpN5pEOWQ1vGRnAOdolIlKv23JqCRf11OpFUGFdZ-yxB3Ii1VE6UjcK-jny-4mcJ5uO-_BaV3bEFbLvU7rJNBlc
origin: https://tennistonic.com
pragma: no-cache
referer: https://tennistonic.com/
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36
x-client-data: CI22yQEIprbJAQjBtskBCKmdygEIl6zKAQisx8oBCPXHygEI58jKAQjpyMoBCOLNygEI3NXKAQjB18oBCP2XywEIj5nLARiKwcoB
Decoded:
message ClientVariations {
// Active client experiment variation IDs.
repeated int32 variation_id = [3300109, 3300134, 3300161, 3313321, 3315223, 3318700, 3318773, 3318887, 3318889, 3319522, 3320540, 3320769, 3329021, 3329167];
// Active client experiment variation IDs that trigger server-side behavior.
repeated int32 trigger_variation_id = [3317898];
}
r = requests.get(url2, headers=headers2)
soup_cont = soup(r.content, 'html.parser')
回复中我的汤的内容如下:
此网站是否受到保护,或者我是否发送了错误的请求
尝试使用
selenium
:相关问题 更多 >
编程相关推荐