了解web刮取的无效文字错误

years = range(1992,2015) yearstext = dict() for year in years: t_1992=requests.get('http://en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_%(year)s' % {"year":year}) soup = BeautifulSoup(t_1992.text, "html.parser") yearstext[year]=soup def parse_year(year, ytextdixt): rows = soup.find("table", attrs={"class": "wikitable"}).find_all("tr")[1:] cleaner = lambda r: [r[0].get_text(), int(r[1].get_text()), r[2].get_text(), r[2].find("a").get("href"), r[3].get_text(),r[3].find("a").get("href")] fields = ["band_singer", "ranking", "song", "songurl","titletext","url"] songs = [dict(zip(fields, cleaner(row.find_all("td")))) for row in rows] ValueError: invalid literal for int() with base 10: 'Pharrell Williams'

2条回答

网友

1楼 · 编辑于 2024-10-03 00:28:44

做了一个小实验我发现：

from bs4 import BeautifulSoup
import requests

year = 1992
t_1992=requests.get('http://en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_%(year)s' % {"year":year})
soup = BeautifulSoup(t_1992.content, "lxml.parser")
rows = soup.find("table", attrs={"class": "wikitable"}).find_all("tr")[1:]
rows[0].get_text()

提供：

u'\n1\n"End of the Road"\nBoyz II Men\n'

所以使用：

rows[0].get_text().strip().split('\n')

提供：

[u'1', u'"End of the Road"', u'Boyz II Men']

这会让你走上正轨。你知道吗

网友

2楼 · 编辑于 2024-10-03 00:28:44

'r[1].get\u text（）'在某些情况下返回'Pharrell Williams'

然后“int（r[1].get\u text（））”触发了此异常。你知道吗

所以重新检查你从网址上得到的细节。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章