python如何使用beauthoulsoup获取网页中某个文本之前的所有标记？ - 问答

2条回答

网友

1楼 · 编辑于 2024-09-27 18:04:09

您可以使用find_all中的function来选择“p”标记，前提是它们以前的所有同级标记都不包含特定的文本，例如：

html = '''
<p>p1</p>
<p>p2</p> 
<p>p3</p>
<span class="zls" id=".B1.D9.87.D8.A7.DB.8C_.D9.88.D8.A"> certain unique text </span>
<p>p4</p>
<p>p5</p>
'''
soup = BeautifulSoup(html, 'html.parser')

def select_tags(tag, text='certain unique text'):
    return tag.name=='p' and all(text not in t.text for t in tag.find_previous_siblings())

print(soup.find_all(select_tags))

[p1, p2, p3]

网友

2楼 · 编辑于 2024-09-27 18:04:09

除了t.m.adam先生已经展示的内容之外，您还可以这样做来从类zls之前出现的p标记中获取文本：

from bs4 import BeautifulSoup

html_content = '''
<t>p0</t>
<y>p00</y> 
<p>p1</p>
<p>p2</p> 
<p>p3</p>
<span class="zls" id=".B1.D9.87.D8.A7.DB.8C_.D9.88.D8.A"> certain unique text </span>
<p>p4</p>
<p>p5</p>
'''
soup = BeautifulSoup(html_content, 'lxml')

for items in soup.select(".zls"):
    tag_items = [item.text for item in items.find_previous_siblings() if item.name=="p"]
    print(tag_items)

输出：

^{pr2}$

python如何使用beauthoulsoup获取网页中某个文本之前的所有<p>标记？

相关问题更多 >

编程相关推荐

热门问题

热门文章

python如何使用beauthoulsoup获取网页中某个文本之前的所有<p>标记？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >