为什么bs4向find_all（）方法返回标记，然后返回空列表？

url = 'http://quickfacts.census.gov/qfd/states/48/48507.html' #last county in TX; for some reason the qfd #'s counties w/ only odd numbers page = urllib2.urlopen(url) soup = BeautifulSoup(page) c_black_alone = soup.find_all("td", attrs={'headers':'rp9'})[0] #c = county % s_black_alone = soup.find_all("td", attrs={'headers':'rp9'})[1] #s = state %

1条回答

网友

1楼 · 发布于 2024-09-30 16:21:38

要从这些匹配项中获取文本，请使用.text获取所有包含的文本：

>>> soup.find_all("td", attrs={'headers':'rp9'})[0].text
u'96.9%'
>>> soup.find_all("td", attrs={'headers':'rp9'})[1].text
u'80.3%'

由于以下两个原因，text搜索不匹配任何内容：

文本字符串只匹配包含在整个中的文本，而不是部分匹配。它只适用于以<td>Black</td>作为唯一内容的元素。在
它将使用^{} property，但只有当文本是给定元素的only子级时，才会设置该属性。如果存在其他元素，搜索将完全失败。在

解决这一问题的方法是使用lambda；它将传递给整个元素，您可以验证每个元素：

^{pr2}$

演示：

>>> soup.find_all(lambda e: e.name == 'td' and 'Black' in e.text)
[<td id="rp10" valign="top">Black or African American alone, percent, 2013 (a)  <!  RHI225213  > </td>, <td id="re6" valign="top">Black-owned firms, percent, 2007  <!  SBO315207  > </td>]

这两个匹配项在<td>元素中都有注释，使得使用text匹配的搜索无效。在

相关问题更多 >

编程相关推荐

热门问题

热门文章