为什么Beautiful Soup不能在表中显示所有<td>数据？

from BeautifulSoup import BeautifulSoup import urllib import sys from urllib import FancyURLopener class MyOpener(FancyURLopener): version = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.65 Safari/534.24' def printList(rowList): for row in rowList: print row print '\n' return url = "http://en.wikipedia.org/wiki/Supernatural_(season_6)" #f = urllib.urlopen(url) #content = f.read() #f.close myopener = MyOpener() page = myopener.open(url) content = page.read() page.close() soup = BeautifulSoup(''.join(content)) soup.prettify() movieList = [] rowListTitle = soup.findAll('tr', 'vevent') print len(rowListTitle) #printList(rowListTitle) for row in rowListTitle: col = row.next # explain this? if col != 'None': col = col.findNext("b") movieTitle = col.string movieTuple = (movieTitle,'') movieList.append(movieTuple) #printList(movieList) for row in movieList: print row[0] rowListDescription = soup.findAll('td' , 'description') print len(rowListDescription) index = 1; while ( index < len(rowListDescription) ): description = rowListDescription[index] print description print description.string str = description print '####################################' movieList[index - 1] = (movieList[index - 1][0],description) index = index + 1

1条回答

网友

1楼 · 发布于 2024-10-03 09:20:33

所有的描述字符串都是空的吗？根据文件：

For your convenience, if a tag has only one child node, and that child node is a string, the child node is made available as tag.string, as well as tag.contents[0].

在这种情况下，描述通常有子节点，即：指向另一篇Wikipedia文章的<a>链接。这算作非字符串子节点，在这种情况下，description节点的string被设置为None。在

相关问题更多 >

编程相关推荐

热门问题

热门文章