漂亮的汤不拉网页的所有html

<div class="dataBild"> <img src="https://tmssl.akamaized.net//images/portrait/header/195652-1456301478.jpg?lm=1456301501" title="Jordon Ibe" alt="Jordon Ibe" class=""> <div class="bildquelle"><span title="imago">imago</span></div> </div>

# Import the Libraries that I need import urllib3 import certifi from bs4 import BeautifulSoup # Specify the URL url = 'https://www.transfermarkt.com/jordon-ibe/profil/spieler/195652' http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where()) response = http.request('GET', url) #Parse the html using beautiful soup and store in variable 'soup' soup = BeautifulSoup(response.data, "html.parser") print(soup)

1条回答

网友

1楼 · 发布于 2024-09-24 00:22:43

站点似乎在检查请求的User-Agent头是否有效。在

所以你需要像这样添加标题：

import urllib3
import certifi

url = 'https://www.transfermarkt.com/jordon-ibe/profil/spieler/195652'
http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())
response = http.request('GET', url, headers={'User-Agent': 'Mozilla/5.0'})
print(response.status)

这将打印200。如果删除标题，则得到404。在

任何非空的User-Agent值（在修剪空白之后）似乎都可以工作。在

相关问题更多 >

编程相关推荐

热门问题

热门文章