BeautifulSoup用句点和sp替换换行符

2024-09-25 08:40:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在与BeautifulSoap链接。在

以下是我要废弃的URL源代码的相关部分:

<div class="description">
Planet Nine was initially proposed to explain the clustering of orbits
Of Planet Nine's other effects, one was unexpected, the perpendicular orbits, and the other two were found after further analysis. Although other mechanisms have been offered for many of these peculiarities, the gravitational influence of Planet Nine is the only one that explains all four. 
</div>

以下是我的BeautifulSoap代码(仅限相关部分)以获取description标记中的文本:

^{pr2}$

使用python运行脚本脚本.pyhttps://example.com/page/2000给出以下输出:

Planet Nine was initially proposed to explain the clustering of orbits
Of Planet Nine's other effects, one was unexpected, the perpendicular orbits, and the other two were found after further analysis. Although other mechanisms have been offered for many of these peculiarities, the gravitational influence of Planet Nine is the only one that explains all four. 

如何将换行符替换为后跟空格的句点,使其看起来如下所示:

Planet Nine was initially proposed to explain the clustering of orbits. Of Planet Nine's other effects, one was unexpected, the perpendicular orbits, and the other two were found after further analysis. Although other mechanisms have been offered for many of these peculiarities, the gravitational influence of Planet Nine is the only one that explains all four.

有什么办法吗?在


Tags: ofthetooneotherclusteringeffectswas
3条回答

试试这个

description = description_box.get_text(separator=" ").rstrip("\n")

使用“拆分和合并”与“选择”

from bs4 import BeautifulSoup as bs

html = '''
<div class="description">
Planet Nine was initially proposed to explain the clustering of orbits
Of Planet Nine's other effects, one was unexpected, the perpendicular orbits, and the other two were found after further analysis. Although other mechanisms have been offered for many of these peculiarities, the gravitational influence of Planet Nine is the only one that explains all four. 
</div>
'''
soup = bs(html, 'lxml')
text = ' '.join(soup.select_one('.description').text.split('\n'))
print(text)

来自here

html = '''<div class="description">
Planet Nine was initially proposed to explain the clustering of orbits
Of Planet Nine's other effects, one was unexpected, the perpendicular orbits, and the other two were found after further analysis. Although other mechanisms have been offered for many of these peculiarities, the gravitational influence of Planet Nine is the only one that explains all four.
</div>'''
n = 2                                # occurrence i.e. 2nd in this case
sep = '\n'                           # sep i.e. newline 
cells = html.split(sep)


from bs4 import BeautifulSoup

html = sep.join(cells[:n]) + ". " + sep.join(cells[n:])
soup = BeautifulSoup(html, 'html.parser')
title_box = soup.find('div', attrs={'class': 'description'})
title = title_box.get_text().strip()
print (title)

输出

^{pr2}$

编辑

from bs4 import BeautifulSoup

page = requests.get("https://blablabla.com")
soup = BeautifulSoup(page.content, 'html.parser')
description_box  = soup.find('div', attrs={'class': 'description'})
description = description_box.get_text().strip()

n = 2                                # occurrence i.e. 2nd in this case
sep = '\n'                           # sep i.e. newline
cells = description.split(sep)
desired = sep.join(cells[:n]) + ". " + sep.join(cells[n:])

print (desired)

相关问题 更多 >