在Python中使用BeautifulSoup提取HTML段落中的文本

2024-10-01 11:25:39 发布

您现在位置:Python中文网/ 问答频道 /正文

<p>
    <a name="533660373"></a>
    <strong>Title: Point of Sale Threats Proliferate</strong><br />
    <strong>Severity: Normal Severity</strong><br />
    <strong>Published: Thursday, December 04, 2014 20:27</strong><br />
    Several new Point of Sale malware families have emerged recently, to include LusyPOS,..<br />
    <em>Analysis: Emboldened by past success and media attention, threat actors  ..</em>
    <br />
</p>

这是我想用Python中的BeautifulSoup从HTML页面中提取的一个段落。 我可以使用.children&;.string方法获取标记内的值。 但我无法得到文本“几个新的销售点恶意软件fa…”这是在段落内没有任何标签。我尝试使用soup.p.text、.get_text()等。。但没用。在


Tags: oftextnamebrtitlesalepointstrong
1条回答
网友
1楼 · 发布于 2024-10-01 11:25:39

使用^{}^{}可查找所有文本节点,^{}仅在父标记{}的直接子节点中搜索:

from bs4 import BeautifulSoup

data = """
<p>
    <a name="533660373"></a>
    <strong>Title: Point of Sale Threats Proliferate</strong><br />
    <strong>Severity: Normal Severity</strong><br />
    <strong>Published: Thursday, December 04, 2014 20:27</strong><br />
    Several new Point of Sale malware families have emerged recently, to include LusyPOS,..<br />
    <em>Analysis: Emboldened by past success and media attention, threat actors  ..</em>
    <br />
</p>
"""

soup = BeautifulSoup(data)
print ''.join(text.strip() for text in soup.p.find_all(text=True, recursive=False))

印刷品:

^{pr2}$

相关问题 更多 >