HTML分割给定ch上

2024-10-01 00:15:57 发布

男 | 程序猿一只，喜欢编程写python代码。

所以我用靓汤来阅读网页的html。你知道吗

req = urllib.request.Request('https://en.wikipedia.org/wiki/Barack_Obama', headers = headers)
html = urllib.request.urlopen(reqx)
page = BeautifulSoup(html,'html.parser')

我想在句点上拆分html代码，条件是当句点位于p标记以外的另一个标记中时它不会拆分。例如，如果html代码是：

<p>In June 2015, the Court ruled 6–3 in <i><a href="/wiki/King_v._Burwell" 
title="King v. Burwell">King v. Burwell</a></i> that subsidies to help individuals 
and families purchase health insurance were authorized for those doing so on both 
the federal exchange and state exchanges, not only those purchasing plans 
"established by the State", as the statute reads.</p>

我不介意在p标记中拆分句点，但不介意在a标记或任何其他标记中拆分句点。将html代码转换为字符串，然后进行拆分显然行不通。我不想使用Beautiful Soup的get\u text（）方法然后在此基础上拆分的主要原因是，我希望拆分发生在原始html上。beautiful soup是否有内置的拆分功能，我可以在其中检查它是否在正确的标签上拆分？或者有没有别的办法？提前感谢：）

因此，我需要的输出是代码分成2部分：

<p>In June 2015, the Court ruled 6–3 in <i><a href="/wiki/King_v._Burwell" 
title="King v. Burwell">King v


 . Burwell</a></i> that subsidies to help individuals and families purchase health insurance were authorized for those doing so on both the federal exchange and state exchanges, not only those purchasing plans "established by the State", as the statute reads.</p>

Tags： and the 代码 in 标记 request html wiki

0条回答

目前没有回答

HTML分割给定ch上

相关问题更多 >

编程相关推荐

热门问题

热门文章

HTML分割给定ch上

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >