在findall（）中提供字符串时，Python beautifulsou会改变行为问题的回答

在findall（）中提供字符串时，Python beautifulsou会改变行为

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

虽然给出了一个帮助开发者继续前进的答案，但我相信为什么这个问题仍然存在。这实际上可以通过参考BeautifulSoup的文档来回答。尤其是这里：<a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-string-argument" rel="nofollow noreferrer">https://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-string-argument</a>。你知道吗 我认为这一节解释了当在<code>find</code>/<code>find_all</code>中使用<code>string="some text"</code>时，它会找到<code>.string</code>属性匹配的标记。你知道吗 <code>.string</code>属性描述如下：<a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/#string" rel="nofollow noreferrer">https://www.crummy.com/software/BeautifulSoup/bs4/doc/#string</a>。它本质上是说<code>.string</code>只有当它的唯一子对象是文本时才返回一些东西。你知道吗 因此，它不能在每个<code>code</code>标记中都起作用的原因是，有些代码标记的内容比文本多。在您的例子中<code>br</code>标记。提供自己的过滤器实际上可以满足您的需求： <pre><code>from bs4 import BeautifulSoup import re text = """<! Data starts here > <code>LGEL 281220Z 33010G20KT CAVOK 32/11 Q1013</code> <code>TAF LGEL 281100Z 2812/2912 34018G28KT 9999 FEW020 BECMG 2816/2818 34015KT TEMPO 2909/2912 34015G25KT</code> <hr width="65%"/> <! Data ends here >""" my_pattern = re.compile('LGEL') def my_filter(tag): """Filter the tag.""" return tag.name == 'code' and my_pattern.search(tag.get_text()) is not None soup = BeautifulSoup(text, 'html.parser') value = soup.find_all(my_filter) print(value)#This will not find second code tag </code></pre> 输出 <pre><code>[<code>LGEL 281220Z 33010G20KT CAVOK 32/11 Q1013</code>, <code>TAF LGEL 281100Z 2812/2912 34018G28KT 9999 FEW020 BECMG 2816/2818 34015KT TEMPO 2909/2912 34015G25KT</code>] </code></pre> 我相信这回答了为什么要展示如何解决这个问题。你知道吗

在findall（）中提供字符串时，Python beautifulsou会改变行为

1 个回答

相关Python问题