Python美丽汤网络爬取具体数字

2024-10-01 04:59:11 发布

您现在位置:Python中文网/ 问答频道 /正文

this page上,每个团队的最终分数(数字)具有相同的类名class="finalScore"。在

当我调用客队的最终分数(在顶部)时,代码会毫无问题地调用该数字。如果。。。favLastGM='A'

当我试图调用主队的最终得分(在底部),代码给我一个错误。如果。。。favLastGM='H'

以下是我的代码:

import pickle
import math
import urllib2
from lxml import etree
from bs4 import BeautifulSoup
from urllib import urlopen

#Last Two Game info Home [H] or Away [A]
favLastGM = 'A' #Higher week number 2

#Game Info (Favorite) Last Game Played - CBS Sports (Change Every Week)
favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'
favPrevGMInfoHtml = urlopen(favPrevGMInfoUrl).read()
favPrevGMInfoSoup = BeautifulSoup(favPrevGMInfoHtml)
if favLastGM == 'A': #This Gives Final Score of Away Team - Away Score
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })
elif favLastGM == 'H':
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[1]
else:
    print("***************************************************")
    print("NOT A VALID ENTRY - favLastGM  !")
    print("***************************************************")


print ("Enter: Total Points Allowed from Favored Team Defense for last game played: "),
print favScore[0].text

这是favLastGM='H'时得到的错误

Traceback (most recent call last): File "C:/Users/jcmcdonald/Desktop/FinalScoreTest.py", line 26, in print favScore[0].text File "C:\Python27\lib\site-packages\bs4\element.py", line 905, in getitem return self.attrs[key] KeyError: 0


Tags: 代码fromimportgame错误数字分数class
3条回答

class="finalScore"只有两个元素,第一个是主队的得分,第二个是客队的得分:

>>> from urllib import urlopen
>>> from bs4 import BeautifulSoup
>>> 
>>> favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'
>>> 
>>> favPrevGMInfoSoup = BeautifulSoup(urlopen(favPrevGMInfoUrl))
>>> score = [item.get_text() for item in favPrevGMInfoSoup.find_all("td", {"class": "finalScore"})]
>>> score
[u'30', u'7']

仅供参考,您可以使用CSS selector:.select("td.finalScore"),而不是.find_all("td", {"class": "finalScore"})。在

在代码中,您正在为favScore分配不同类型的对象。所以在第一种情况下,你有:

if favLastGM == 'A': #This Gives Final Score of Away Team - Away Score
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })

最后你得到了一张单子。。。在

^{pr2}$

而在第二种情况下,你有:

elif favLastGM == 'H':
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[1]

你最终得到了一个漂亮的组合元素。。。在

favScore = <td class="finalScore">7</td>

您可以通过执行以下操作来解决此问题(请注意[0]):

if favLastGM == 'A': #This Gives Final Score of Away Team - Away Score
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[0]
elif favLastGM == 'H':
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[1]

最后做:

print favScore.text

我稍微扩展了@alecxe的答案,明确选择主场客场球队(而不是依赖于数组的隐式排序):

from urllib import urlopen
from bs4 import BeautifulSoup

favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'

favPrevGMInfoSoup = BeautifulSoup(urlopen(favPrevGMInfoUrl))

home_score = favPrevGMInfoSoup.find("tr", {"class": "teamInfo homeTeam"}).find("td", {"class": "finalScore"}).get_text()
away_score = favPrevGMInfoSoup.find("tr", {"class": "teamInfo awayTeam"}).find("td", {"class": "finalScore"}).get_text()

print home_score, away_score

相关问题 更多 >