无法通过“403禁止”Python WebScrap尝试更改标头

import requests headers={'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'en-US,en;q=0.5', 'Referer': 'https://duckduckgo.com/', 'TE':'trailers', 'Upgrade-Insecure-Requests':'1', 'USER-AGENT':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:81.0) Gecko/20100101 Firefox/81.0', 'X-Real-Ip': '[insert IP]', 'X-Http-Proto': 'HTTP/1.1', 'Host': 'curseforge.com'} url = 'https://www.curseforge.com/minecraft/modpacks?page=2' req = requests.get(url, headers=headers) print(req.status_code)

1条回答

网友

1楼 · 发布于 2024-07-03 07:47:13

对不起，你在这儿运气不好。由于https://www.curseforge.com受到Cloudfare的保护，我尝试使用和不使用cloudfare-scrap库运行脚本

但这两次我都得到了Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.，因为Cloudfare和reCAPTCHA在防止DDoS和像您这样的刮刮器方面做得非常出色，所以很难绕过它们。但我确实提出了一些解决方案来破解它，尽管要知道这些都不是完美的

你可以打破谷歌的重述，这里有更多的数据Blackhat conference 2016
您可以提取reCAPTCHA在传递它时提供给您的临时cookie，并在每次请求中注入它。但是要注意这种方法，因为对同一页面的请求太多会使主机产生怀疑，它可能会撤销您的cookie，您必须再次刷新它
最后，您可以尝试另一种方法，并使用selenium打开浏览器驱动程序以进行手动reCAPTCHA输入

相关问题更多 >

编程相关推荐

热门问题

热门文章