网络抓取转移标记最有价值的玩家

2024-09-28 01:31:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我是新来的网页抓取。在

我找不到我在这段代码中的错误:

import requests
import csv
from bs4 import BeautifulSoup
url = "https://www.transfermarkt.co.uk/spieler- 
statistik/wertvollstespieler/marktwertetop"
response=requests.get(url)
html_icerigi=response.content
soup=BeautifulSoup(html_icerigi,"html.parser")
footballer = soup.find_all("a",{"class":"spielprofil_tooltip tooltipstered"})
footballer_list=[]
for footballer in footballer_list:
   footballer=footballer.text
    footballer=footballer.strip()
    footballer=footballer.replace("\n","")
    footballer_list.append(["Futbolcu:{}".format(footballer)])
print(footballer_list)

Tags: csv代码fromimporturl网页responsehtml
3条回答

安装Selenium,然后以这种方式访问它。否则,您的代码似乎可以工作

import bs4 
from selenium import webdriver 

browser = webdriver.Chrome()
browser.get('https://www.transfermarkt.co.uk/spieler-statistik/wertvollstespieler/marktwertetop')

html_icerigi = browser.page_source

soup = bs4.BeautifulSoup(html_icerigi,"html.parser")

footballer = soup.find_all("a",{"class":"spielprofil_tooltip tooltipstered"})
footballer_list=[]

for footballer in footballer_list:
    footballer=footballer.text
    footballer=footballer.strip()
    footballer=footballer.replace("\n","")
    footballer_list.append(["Futbolcu:{}".format(footballer)])
print(footballer)

browser.close()  

输出:

^{pr2}$

它可以用BeautifulSoup和这里的问题

  1. 用户需要设置防刮剂

  2. 追加tooltipstered的类 你可以动态地删除它。

  3. 使用response.text代替转义字符串response.content

  4. {cd5>元素列表不是空的

    footballer_list=[]
    for footballer in footballer_list:
    
  5. 不必要的多行变量重写,可能是错误的列表树,你的意思是想 附加dict而不是

    [['Futbolcu:Kylian Mbappé'], ......, ['Futbolcu:Marlon Freitas']]
    

固定代码:

import requests
import csv
from bs4 import BeautifulSoup

url = "https://www.transfermarkt.co.uk/spieler-statistik/wertvollstespieler/marktwertetop"
heads = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'}
response = requests.get(url, headers=heads)
html_icerigi = response.text
soup = BeautifulSoup(html_icerigi, "html.parser")
footballers = soup.find_all("a",{"class":"spielprofil_tooltip"})
footballer_list = []
for footballer in footballers:
    footballer_list.append({"Futbolcu" : footballer.text.strip()})

print(footballer_list)
print(footballer_list[5]["Futbolcu"])

结果:

[
 {'Futbolcu': 'Kylian Mbappé'}, 
 ......., 
 {'Futbolcu': 'Marlon Freitas'}
]

除了selenium,您还可以使用requests_html来呈现页面。尽管你在问为什么你什么都没有得到,你的for-loop是错的。这意味着您最终将得到空的footballer_list,即使您已经运行了JavaScript并获得了完整的html代码。在

import requests_html
from bs4 import BeautifulSoup

url = "https://www.transfermarkt.co.uk/spieler-statistik/wertvollstespieler/marktwertetop"
with requests_html.HTMLSession() as s:
    resp = s.get(url)
    resp.html.render()
    page = resp.html.raw_html


soup = BeautifulSoup(page,"html.parser")
footballer_all = soup.find_all("a",{"class":"spielprofil_tooltip tooltipstered"})

footballer_list = []

for footballer in footballer_all:
    footballer = footballer.text
    footballer = footballer.strip()
    footballer = footballer.replace("\n","")
    footballer_list.append(["Futbolcu:{}".format(footballer)])

print(footballer_list)

相关问题 更多 >

    热门问题