<p>在处理如此复杂的条件时,最好将其包装在单独的函数中。BeautifulSoup允许您使用<a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/#a-function" rel="nofollow noreferrer">function as filter</a>。你知道吗</p>
<pre><code>from bs4 import BeautifulSoup, Tag
html = """
<div class="c1"></div>
<div class="c1" id="myid">
<div class="c1"></div>
</div>
<div class="c2"></div>
<div class="c3" id="myid"></div>
<div class="c4"></div>
<div></div>
<div id="myid"></div>
"""
soup = BeautifulSoup(html, 'html.parser')
classToIgnore = ["c1", "c2"]
# Using decompose to solve cases where
# unwanted classes comes inside wanted classes
for div in soup.find_all('div', class_=lambda x: x in classToIgnore):
div.decompose()
def my_filter(ele):
if (
isinstance(ele, Tag) and
ele.name == 'div' and
ele.get('id') == 'myid' and not ele.get('class') or
ele.get('class')
):
return True
print(soup.find_all(my_filter))
</code></pre>
<p>输出</p>
<pre><code>[<div class="c3" id="myid"></div>, <div class="c4"></div>, <div id="myid"></div>]
</code></pre>