如何删除BeautifulSoup中以前的兄弟姐妹

2024-10-04 01:34:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从<hr />标记的untop中删除前一个同级,并在</h2>标记下面删除下一个同级,问题是我得到了这个错误AttributeError: 'NavigableString' object has no attribute 'decompose'

我试图解析的HTML是这样的

<h1>Heading text</h1>

<p style="text-align: justify;">this and everything untop i want to delete</p>
<hr />
<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> this and text below i want to keep</p>

<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> text tex text</p>

<h2>Heading 2</h2>

<p> this and everything below i want to remove</p>

像上面给出的那样输入html不会给出移除同级的结果,只返回AttributeError。我做错了什么?如何解决这个问题

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')

for prev_sibling in soup.find("hr").previous_siblings:
    prev_sibling.decompose()

for next_sibling in soup.find("h2").next_siblings:
    prev_sibling.decompose()


Tags: andtotext标记stylehrh2this
1条回答
网友
1楼 · 发布于 2024-10-04 01:34:43

使用find_previous_siblings()和find_next_siblings()

from bs4 import BeautifulSoup
html='''<h1>Heading text</h1>
<p style="text-align: justify;">this and everything untop i want to delete</p>
<hr />
<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> this and text below i want to keep</p>
<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> text tex text</p>
<h2>Heading 2</h2>
<p> this and everything below i want to remove</p>'''

soup = BeautifulSoup(html, 'lxml')

for prev_sibling in soup.find("hr").find_previous_siblings():
    prev_sibling.decompose()

for next_sibling in soup.find("h2").find_next_siblings():
    next_sibling.decompose()

print(soup)

相关问题 更多 >