我想在一个段落标签中搜集信息。
标记中还有一些其他标记。我将用下面的代码向您展示。你知道吗
这是你的名字
以下是要删除的html页面:
<div class="thecontent">
<p>Here’s the schedule of matches for the weekend.</p>
<p> </p>
<p><strong>Saturday, August 17</strong></p>
<p>Achara vs. Buad, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> — Have enjoy it and celebrate it</p>
<p>pritos vs. baola, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> — Have enjoy it and celebrate it</p>
<p>timpao vs. quadrsa, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> — Have enjoy it and celebrate it</p>
<p><strong>Sunday, August 18</strong></p>
<p>Achara vs. timpao, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> — Have enjoy it and celebrate it</p>
<p>pritos vs. qaudra, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> — Have enjoy it and celebrate it</p>
<p>timpao vs. Buad, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> — Have enjoy it and celebrate it</p>
<p> </p>
<p><strong>Monday, August 19</strong></p>
<p>Achara vs. Buad, <a href="@">ftv</a>, <a href="https://someothertv">HTlive</a>, <a href="http://www.anothertv target="_blank">Se</a> — Have enjoy it and celebrate it</p>
</p>
<p> </p></div></body></html>
我使用了以下python代码:
import bs4,requests
getnwp = requests.get('https://url')
nwpcontent = getnwp.content
sp2 = bs4.BeautifulSoup(nwpcontent, 'html5lib')
pta = sp2.find('div', class_ = 'thecontent').find_all('p')
for i in range(len(pta)):
if pta[i].get_text().find("vs") != -1:
print (pta[i].get_text())
有了上面的信息,我只想提取团队之间的匹配和它发生的日期。下面的小信息是:
Saturday, August 17
Achara vs. timpao, — Have enjoy it and celebrate it
pritos vs. baola, — Have enjoy it and celebrate it
timpao vs. quadrsa, — Have enjoy it and celebrate it
Sunday, August 18
Achara vs. timpao, — Have enjoy it and celebrate it
pritos vs. qaudra, — Have enjoy it and celebrate it
timpao vs. Buad, — Have enjoy it and celebrate it
Monday, August 19
Achara vs. Buad, — Have enjoy it and celebrate it
我的意思是我不想要关于电视广播的信息(或者锚标签中的信息)。你知道吗
不知道真正的来源是什么样的。例如,您可以删除标记并使用
:has
和:not(:empty)
作为目标。需要bs4.7.1+看起来包含内容的段落还包含提示“,-享受它并庆祝它”,因此当您检索其内容时,它总是添加。你能做的就是通过做一些类似的事情来去除绳子的尾部
这样您将删除结果字符串的最后33个字符。你知道吗
相关问题 更多 >
编程相关推荐