编辑:我添加了完整的代码集,因为我试图简化它,但没有成功。我能够注释掉有问题的行,并且它是有效的,所以我认为在那之前我还可以
我正在做一个网页刮板,我被这行上的这个属性错误卡住了historical_seasons = soup_loop.find('select', {'name': 'comp_id'}).contents
,弄不清是什么错了
# https://statbunker.com/robots.txt
import requests
from bs4 import BeautifulSoup
# navigate to https://rugby.statbunker.com/
url = 'https://rugby.statbunker.com/'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
# create lists of all leagues
league_links = []
leagues = soup.find_all('a', {'class': 'pointer'})
for link in leagues:
if link.has_attr('href'):
league_links.append(link['href'])
print(league_links)
# create list of all seasons for that league
season_links = []
for link in league_links:
response_loop = requests.get(link)
html_loop = response_loop.content
soup_loop = BeautifulSoup(html_loop, 'html.parser')
league_results = soup_loop.find_all('img', alt='Latest Results')
for results in league_results:
parent = results.parent
if parent.has_attr('href'):
season_links.append(parent['href'])
# historical seasons
historical_seasons = soup_loop.find('select', {'name': 'comp_id'}).contents
for season in historical_seasons[3:-1]:
season_id = season['value']
base_url = 'https://rugby.statbunker.com/competitions/LastMatches?comp_id='
new_url = base_url + season_id
season_links.append(new_url)
print(len(season_links))
让我特别困惑的是,当我把它孤立起来时,它似乎起了作用
import requests
from bs4 import BeautifulSoup
# navigate to https://rugby.statbunker.com/
url = 'https://rugby.statbunker.com/competitions/LastMatches?comp_id=637'
response = requests.get(url)
html_loop = response.content
soup_loop = BeautifulSoup(html_loop, 'html.parser')
test_list = []
historical_seasons = soup_loop.find('select', {'name': 'comp_id'}).contents
for season in historical_seasons[3:-1]:
season_id = season['value']
base_url = 'https://rugby.statbunker.com/competitions/LastMatches?comp_id='
new_url = base_url + season_id
test_list.append(new_url)
print(test_list)
完整的代码最终为我重新创建了错误。查看
print(league_links)
的输出,第四个是'https://rugby.statbunker.com/competitions/LeagueTable?comp_id='
,没有comp_id
。这将加载一个没有表的页面,因此在soup_loop.find('select', {'name': 'comp_id'})
中找不到任何内容,因此它返回None
None
没有属性content
,因此出现错误我不确定为什么会返回
comp_id=
链接,所以您必须自己调试该部分相关问题 更多 >
编程相关推荐