我试图从一个网站上收集一些统计数据,我试图做的是提取一个word
并计算在同一个标签中发现的相邻单词的数量
输入
<div class="col-xs-12">
<p class="w50">Operating Temperature (Min.)[°C]</p>
<p class="w50 upperC">-40</p>
</div>
会导致
标签1
Operating , 2 i.e #<Temperature, (Min.)[°C]>
Temperature, 2 i.e #<Operating, (Min.)[°C]>
(Min.)[°C], 2 i.e #<Operating,Temperature>
标签2
-40, 0
这就是我最终的目的,但它提取了整个文本
url = 'https://www.rohm.com/products/wireless-communication/wireless-lan-modules/bp3580-product#'
with urllib.request.urlopen(url) as url:
page = url.read()
soup = BeautifulSoup(page, features='lxml')
# [print(tag.name) for tag in soup.find_all()]
for script in soup(["script", "style"]):
script.decompose() # rip it out
invalid_tags = ['br']
for tag in invalid_tags:
for match in soup.findAll(tag):
match.replaceWithChildren()
html = soup.find_all(recursive=False)
for tag in html:
print(tag.get_text())
我试着用recursive = True
进行测试,但是结果重复了很多
它可能不是你执行的结果,但至少它给了你一个提示。我修改了你的代码。你知道吗
相关问题 更多 >
编程相关推荐