<p>可以将<code>nth-of-type</code>、<code>:not</code>伪类与一般同级<code>~</code>组合器一起使用。由于<code>a</code>标记都是同级标记,我相信,在显示的html中,我使用类型为nth的<code>b</code>标记将<code>a</code>标记拆分为块。我使用<code>:not</code>从当前文件中删除后面的<code>a</code>同级文件。你知道吗</p>
<pre><code>from bs4 import BeautifulSoup as bs
html = '''
<B>Heading Title 1:</B>&nbsp;<a href="link1">Title1</a>&nbsp;
<a href="link2">Title2</a>&nbsp;
&nbsp;
<B>Heading Title 2:</B>&nbsp;<a href="link3">Title3</a>&nbsp;
<a href="link4">Title4</a>&nbsp;
<a href="link5">Title5</a>&nbsp;
'''
soup = bs(html, 'lxml')
items = soup.select('b:has(~a)')
length = len(items)
if length == 1:
row = [item.text for item in soup.select('b ~ a')]
print(row)
elif length > 1:
for i in range(1, length + 1):
row = [item.text for item in soup.select('b:nth-of-type(' + str(i) + ') ~ a:not(b:nth-of-type(' + str(i + 1) + ') ~ a)')]
print(row)
</code></pre>
<p>输出:</p>
<p><a href="https://i.stack.imgur.com/YxAfv.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/YxAfv.png" alt="enter image description here"/></a></p>