为什么我不能用Python解析这个HTML页面？

1 Pedro Martinez 2 Jeff Fassero* 3 Ramon Martinez 4 Pete Schourek* 5 Rolando Arrojo 6 Tomo Ohka 7 Derek Lowe 8 Tim Wakefield 9 Rich Garces 10 Rheal Cormier* 11 Hipolito Pichardo 12 Brian Rose 13 Bryce Florie 14 John Wasdin 15 Pedro Martinez 16 Jeff Fassero* 17 Ramon Martinez 18 Pete Schourek* 19 Rolando Arrojo 20 Tomo Ohka 21 Derek Lowe 22 Tim Wakefield 23 Rich Garces 24 Rheal Cormier* 25 Hipolito Pichardo 26 Brian Rose 27 Bryce Florie 28 John Wasdin

1条回答

网友

1楼 · 发布于 2024-06-01 09:01:59

您试图遍历表中的所有行，而不是先获取所有表标记。因此，如果有意义的话，可以获取所有的table标记，然后遍历table标记中的所有tr标记。而且year和{}是未定义的，所以我假设年份是y，并使table变量{}。另外，您不必下载HTML，然后打开它来解析它，您只需通过获取连接的文本并直接解析来获得HTML。在

import urllib2
from bs4 import BeautifulSoup

# Download webpages 2010 webpage

y = 2010
url = 'http://www.baseball-reference.com/teams/BOS/'+ str(y) +'-pitching.shtml'
print 'Download from :', url

#dowlnload
filehandle = urllib2.urlopen(url)


fileout = 'YEARS'+str(y)+'.html'
print 'Save to : ', fileout, '\n'

#save file to disk
f = open(fileout,'w')
f.write(filehandle.read())
f.close()


# Read and parse the html file

# Parse information about the age of players in 2000

y = 2010

filein = 'YEARS' + str(y) + '.html'
print(filein)
soup = BeautifulSoup(open(filein))


table = soup.find_all('table', attrs={'id': 'team_pitching'}) #' non_qual' ''


for t in table:

    i = 1
    entries = t.find_all('tr', attrs={'class' : ''}) #' non_qual' ''
    print(len(entries))
    for entry in entries:
        columns = entry.find_all('td')
        printString = str(i) + ' '
        for col in columns:
            try:
                if ((',' in col['csk']) and (col['csk'] != '')):
                    printString = printString + col.text
                    i = i + 1
                    print printString
            except:
                pass

相关问题更多 >

编程相关推荐

热门问题

热门文章