Web抓取访问被拒绝| Cloudflare限制访问

import requests from requests_html import HTML source = requests.get('https://www.cclonline.com/category/409/PC-Components/Graphics-Cards/') html = HTML(html=source.text) print(source.status_code) print(html.text)

403 Access denied | www.cclonline.com used Cloudflare to restrict access Please enable cookies. Error 1020 Ray ID: 64c0c2f1ccb5d781 • 2021-05-08 06:51:46 UTC Access denied What happened? This website is using a security service to protect itself from online attacks.

1条回答

网友

1楼 · 发布于 2024-09-30 06:24:56

因此，该站点的robots.txt没有明确表示不允许使用bot。但是，您需要使您的请求看起来像来自实际的浏览器。现在来解决眼前的问题。响应说您需要启用cookies。所以这可以通过使用像selenium这样的无头浏览器来解决。Selenium拥有浏览器所能提供的一切（它基本上使用google chrome或您选择的浏览器作为驱动程序）。它将使服务器认为请求来自实际的浏览器，并将返回响应

了解有关如何使用硒进行刮除here的更多信息

还记得相应地调整爬网时间。每次请求后暂停，并经常交换用户代理

相关问题更多 >

编程相关推荐

热门问题

热门文章