BeautifulGroup缺少/跳过标记

2024-09-29 23:33:30 发布

您现在位置：Python中文网/ 问答频道 /正文

6284

网友

男 | 程序猿一只，喜欢编程写python代码。

如果你能给我指出正确的方向，我将不胜感激。这是一种更好的捕获文本的方式。。。在

如果我喜欢的话。我在最后遗漏了一些标签原始的html字符串是20K大小（所以它有很多数据）。在

soup = BeautifulSoup(r.content, 'html5lib')
c.case_html = str(soup.find('div', class_='DocumentText')
print(self.case_html)

下面是刮取的代码，目前还可以工作，但是添加了第二个新标记-它已损坏。在

^{pr2}$

示例html如下所示原始字符串大小约为20K

<form name="form1" id="form1">
<div id="theDocument" class="DocumentText" style="position: relative; float: left; overflow: scroll; height: 739px;">
<p>PTag</p>
<p> <center> First center </center> </p>
<small> this is small</small>
<p>...</p>
<p> <center> Second Center </center> </p>
<p>....</p>
</div>
</form>

预期输出为

<div id="theDocument" class="DocumentText" style="position: relative; float: left; overflow: scroll; height: 739px;">
<p>PTag</p>
<p> <center> First center </center> </p>
<small> this is small</small>
<p>...</p>
<p> <center> Second Center </center> </p>
<p>....</p>
</div>

Tags：字符串 div form id style html position class

1条回答

网友

1楼 · 发布于 2024-09-29 23:33:30

你可以试试这个。我只是根据你给出的html代码来回答。如果你需要澄清，请告诉我。谢谢！在

 soup = BeautifulSoup(r.content, 'html5lib')
 case_html = soup.select('div.DocumentText')
 print(case_html.get_text())

BeautifulGroup缺少/跳过标记

相关问题更多 >

编程相关推荐

热门问题

热门文章

BeautifulGroup缺少/跳过标记

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >