如何刮除所有p标记中的所有文本，包括span中的文本？

Location: New Delhi / Safdarjung, Current Time: Feb 12, 2017 at 10:29:52 am, Latest Report: Feb 12, 2017 at 8:30 am, Visibility: 1 km, Pressure: 102.12 kPa, Humidity: 95%, Dew Point: 10 °C

3条回答

网友

1楼 · 编辑于 2024-10-02 02:38:12

您可以尝试使用BeautifulSoup对象para.text的.text属性。我使用re.split()函数进一步拆分了密钥对值，如果您不想拆分，那么只需执行para.text

from bs4 import BeautifulSoup
import re

a = """<p><span class="four">Location: </span> <span id="wt-loc" title="New Delhi / Safdarjung">New Delhi / Safdarjung</span></p>, <p><span class="four">Current Time: </span> <span id="wtct">Feb 12, 2017 at 10:29:52 am</span></p>, <p><span class="four">Latest Report: </span> Feb 12, 2017 at 8:30 am</p>, <p><span class="four">Visibility: </span> 1 km</p>, <p><span class="four">Pressure: </span> 102.12 kPa</p>, <p><span class="four">Humidity: </span> 95%</p>, <p><span class="four">Dew Point: </span> 10 °C</p>"""

soup = BeautifulSoup(a, 'html.parser')
re.split(r', (?=\s*[A-Z])', soup.text)

输出：

[u'Location:  New Delhi / Safdarjung',
 u'Current Time:  Feb 12, 2017 at 10:29:52 am',
 u'Latest Report:  Feb 12, 2017 at 8:30 am',
 u'Visibility:  1 km',
 u'Pressure:  102.12 kPa',
 u'Humidity:  95%',
 u'Dew Point:  10 \uc9f8C']

网友

2楼 · 编辑于 2024-10-02 02:38:12

使用.text获取p标记下的所有文本，您需要做的是迭代findAll(p)

from bs4 import BeautifulSoup
html = '''<p><span class="four">Location: </span> <span id="wt-loc" title="New Delhi / Safdarjung">New Delhi / Safdarjung</span></p>, <p><span class="four">Current Time: </span> <span id="wtct">Feb 12, 2017 at 10:29:52 am</span></p>, <p><span class="four">Latest Report: </span> Feb 12, 2017 at 8:30 am</p>, <p><span class="four">Visibility: </span> 1 km</p>, <p><span class="four">Pressure: </span> 102.12 kPa</p>, <p><span class="four">Humidity: </span> 95%</p>, <p><span class="four">Dew Point: </span> 10 °C</p>'''

soup = BeautifulSoup(html, 'lxml')

for p in soup.find_all('p'):
    print(p.text)

输出：

Location:  New Delhi / Safdarjung
Current Time:  Feb 12, 2017 at 10:29:52 am
Latest Report:  Feb 12, 2017 at 8:30 am
Visibility:  1 km
Pressure:  102.12 kPa
Humidity:  95%
Dew Point:  10 °C

网友

3楼 · 编辑于 2024-10-02 02:38:12

Beautiful soup有一个名为get_text()的函数，允许您忽略其他标记获取标记中的所有文本。只要打电话p.get_text()。如果还想删除空白，请调用p.get_text(strip=True)

相关问题更多 >

编程相关推荐

热门问题

热门文章