Beautifulsoup find element by text无论i中是否有元素

2024-09-25 00:32:24 发布

您现在位置:Python中文网/ 问答频道 /正文

例如

bs = BeautifulSoup("<html><a>sometext</a></html>")
print bs.find_all("a",text=re.compile(r"some"))

返回[<a>sometext</a>],但当搜索的元素有子元素时,即img

bs = BeautifulSoup("<html><a>sometext<img /></a></html>")
print bs.find_all("a",text=re.compile(r"some"))

它返回[]

有没有办法用find_all来匹配后面的例子?


Tags: textre元素imgbshtmlsomeall
1条回答
网友
1楼 · 发布于 2024-09-25 00:32:24

您将需要使用混合方法,因为当元素既有文本又有子元素时,text=将失败。

bs = BeautifulSoup("<html><a>sometext</a></html>")    
reg = re.compile(r'some')
elements = [e for e in bs.find_all('a') if reg.match(e.text)]

背景

当BeautifulSoup正在搜索元素,并且text是可调用的时,它最终eventually calls

self._matches(found.string, self.text)

在您给出的两个示例中,.string方法返回不同的内容:

>>> bs1 = BeautifulSoup("<html><a>sometext</a></html>")
>>> bs1.find('a').string
u'sometext'
>>> bs2 = BeautifulSoup("<html><a>sometext<img /></a></html>")
>>> bs2.find('a').string
>>> print bs2.find('a').string
None

.string方法如下所示:

@property
def string(self):
    """Convenience property to get the single string within this tag.

    :Return: If this tag has a single string child, return value
     is that string. If this tag has no children, or more than one
     child, return value is None. If this tag has one child tag,
     return value is the 'string' attribute of the child tag,
     recursively.
    """
    if len(self.contents) != 1:
        return None
    child = self.contents[0]
    if isinstance(child, NavigableString):
        return child
    return child.string

如果我们打印出内容,我们可以看到为什么返回None

>>> print bs1.find('a').contents
[u'sometext']
>>> print bs2.find('a').contents
[u'sometext', <img/>]

相关问题 更多 >