BeautifulSoup:AttributeError:“非类型”对象没有属性“内容”

2024-10-04 07:35:05 发布

您现在位置:Python中文网/ 问答频道 /正文

编辑:我添加了完整的代码集,因为我试图简化它,但没有成功。我能够注释掉有问题的行,并且它是有效的,所以我认为在那之前我还可以

我正在做一个网页刮板,我被这行上的这个属性错误卡住了historical_seasons = soup_loop.find('select', {'name': 'comp_id'}).contents,弄不清是什么错了

# https://statbunker.com/robots.txt

import requests
from bs4 import BeautifulSoup

# navigate to https://rugby.statbunker.com/
url = 'https://rugby.statbunker.com/'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')

# create lists of all leagues
league_links = []
leagues = soup.find_all('a', {'class': 'pointer'})

for link in leagues:
    if link.has_attr('href'):
        league_links.append(link['href'])
print(league_links)

# create list of all seasons for that league
season_links = []
for link in league_links:
    response_loop = requests.get(link)
    html_loop = response_loop.content
    soup_loop = BeautifulSoup(html_loop, 'html.parser')
    
    league_results = soup_loop.find_all('img', alt='Latest Results')

    for results in league_results:
        parent = results.parent
        if parent.has_attr('href'):
            season_links.append(parent['href'])

    # historical seasons
    historical_seasons = soup_loop.find('select', {'name': 'comp_id'}).contents

    for season in historical_seasons[3:-1]:
        season_id = season['value']
        base_url = 'https://rugby.statbunker.com/competitions/LastMatches?comp_id='
        new_url = base_url + season_id
        season_links.append(new_url)
print(len(season_links))

让我特别困惑的是,当我把它孤立起来时,它似乎起了作用

import requests
from bs4 import BeautifulSoup

# navigate to https://rugby.statbunker.com/

url = 'https://rugby.statbunker.com/competitions/LastMatches?comp_id=637'
response = requests.get(url)
html_loop = response.content
soup_loop = BeautifulSoup(html_loop, 'html.parser')

test_list = []
historical_seasons = soup_loop.find('select', {'name': 'comp_id'}).contents

for season in historical_seasons[3:-1]:
    season_id = season['value']
    base_url = 'https://rugby.statbunker.com/competitions/LastMatches?comp_id='
    new_url = base_url + season_id
    test_list.append(new_url)
print(test_list)

Tags: httpscomloopidurlhtmllinksseason
1条回答
网友
1楼 · 发布于 2024-10-04 07:35:05

完整的代码最终为我重新创建了错误。查看print(league_links)的输出,第四个是'https://rugby.statbunker.com/competitions/LeagueTable?comp_id=',没有comp_id。这将加载一个没有表的页面,因此在soup_loop.find('select', {'name': 'comp_id'})中找不到任何内容,因此它返回NoneNone没有属性content,因此出现错误

我不确定为什么会返回comp_id=链接,所以您必须自己调试该部分

相关问题 更多 >