如何在BeautifulSoup中找到标签和文本的组合

2024-09-28 05:23:24 发布

男 | 程序猿一只，喜欢编程写python代码。

我从一个网站上刮下HTMl，需要在其中获得一个特定的标签，问题是，它的格式混乱，我无法获得整个标签。让我举例说明：

data = """
<div class="Answer">
1. BOUNDARIES - EPB &amp; APL&nbsp;<i>(inferior)</i>, EPL&nbsp;<i>(superior).&nbsp;</i><div>2. FLOOR (proximal to distal) - radial styloid =&gt; scaphoid =&gt; trapezium =&gt; 1st MC base.&nbsp;<br /><div>3. CONTENTS - cutaneous branches of radial nerve&nbsp;<i>(on the roof),</i>&nbsp;cephalic vein&nbsp;<i>(begins here),</i>&nbsp;&nbsp;radial artery&nbsp;<i>(on the floor).</i></div></div><div><br /></div><div><img src="paste-27a44c801f0776d91f5f6a16a963bff67f0e8ef3.jpg" /><br /></div><div><b>Image:&nbsp;</b>Case courtesy of Dr Sachintha Hapugoda, &lt;a href="https://radiopaedia.org/"&gt;Radiopaedia.org&lt;/a&gt;. From the case &lt;a href="https://radiopaedia.org/cases/52525"&gt;rID: 52525&lt;/a&gt; [Accessed 15 Nov. 2018].</div>
</div>
"""

从上面，我只想得到：

<div><b>Image:&nbsp;</b>Case courtesy of Dr Sachintha Hapugoda, &lt;a href="https://radiopaedia.org/"&gt;Radiopaedia.org&lt;/a&gt;. From the case &lt;a href="https://radiopaedia.org/cases/52525"&gt;rID: 52525&lt;/a&gt; [Accessed 15 Nov. 2018].</div>

我编写了以下代码：

soup = BeautifulSoup(data, "html.parser")
image_link = soup.find('div').find('b').next.next
print(image_link)

但这只会让我明白：

Case courtesy of Dr Sachintha Hapugoda, <a href="https://radiopaedia.org/">Radiopaedia.org</a>. From the case <a href="https://radiopaedia.org/cases/52525">rID: 52525</a> [Accessed 15 Nov. 2018].

如何获取整个标签？你知道吗

Tags： of the https org br lt gt div

1条回答

网友

1楼 · 发布于 2024-09-28 05:23:24

或许可以试试：

image_link = soup.find('div').find('img').next.next

输出：

<div><b>Image: </b>Case courtesy of Dr Sachintha Hapugoda, &lt;a href="https://radiopaedia.org/"&gt;Radiopaedia.org&lt;/a&gt;. From the case &lt;a href="https://radiopaedia.org/cases/52525"&gt;rID: 52525&lt;/a&gt; [Accessed 15 Nov. 2018].</div>

如何在BeautifulSoup中找到标签和文本的组合

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在BeautifulSoup中找到标签和文本的组合

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >