如何从这个HTML字符串中拆分文本元素？python

3条回答

网友

1楼 · 编辑于 2024-10-01 17:22:43

你可以试试：

from bs4 import BeautifulSoup
import re

# the soup
soup = BeautifulSoup("<span><strong>13:30</strong><br/>SecondWord</span></a>", 'lxml')

# the regex object
rx = re.compile(r'(\d+:\d+)(.+)')

# time, text
text = soup.find('span').get_text()
x,y  = rx.findall(text)[0]
print(x)
print(y)

网友

2楼 · 编辑于 2024-10-01 17:22:43

使用recursive=False仅获取直接文本，使用strong.text获取另一个文本

Ex:

from bs4 import BeautifulSoup

soup = BeautifulSoup("<span><strong>13:30</strong><br/>SecondWord</span></a>", 'lxml')

# text1
print(soup.find("span").strong.text)     #  > 13:30
# text2          
print(soup.find("span").find(text=True, recursive=False))  #  > SecondWord

网友

3楼 · 编辑于 2024-10-01 17:22:43

from bs4 import BeautifulSoup


txt = '''<span><strong>13:30</strong><br/>SecondWord</span></a>'''
soup = BeautifulSoup(txt, 'html.parser')

text1, text2 = soup.span.get_text(strip=True, separator='|').split('|')

print(text1)
print(text2)

印刷品：

13:30
SecondWord

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何从这个HTML字符串中拆分文本元素？python

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >