如何使用beauthulsoup和Python获取一个span后面的文本?

2024-10-01 09:30:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我只想要跨距之外的文本,而不需要跨距内的文本。我现在的代码给了我所有的信息:

birthday = bsObj.find( "div", {"class":"age"} )
# <div class="age"><span class="category">Age:</span> 23 (10/21/1992)</div>
birthday.get_text()
birthplace = bsObj.find( "div", {"class":"hometown"} )
# <div class="hometown"><span class="category">Birthplace:</span> Barranquilla, Colombia</div>
birthplace.get_text()

结果:

^{pr2}$

期望结果:

"24 (04/21/1991)","Barranquilla, Colombia"

Tags: text文本divagegetfindclassspan
2条回答
  • 使用clear()删除{}
  • strip()删除前导和尾随空格

from bs4 import BeautifulSoup

soup = BeautifulSoup('<div class="age"><span class="category">Age:</span> 23 (10/21/1992)</div>', 'html.parser')
soup.span.clear()
print(soup.get_text().strip())

输出:

^{pr2}$

只需在get_text()之前清除跨度

from bs4 import BeautifulSoup

html_doc ='<html><body><div class="age"><span class="category">Age:</span> 23 (10/21/1992)</div><div class="hometown"><span class="category">Birthplace:</span> Barranquilla, Colombia</div></body></html>'

bsObj = BeautifulSoup(html_doc, 'html.parser')

# <div class="age"><span class="category">Age:</span> 23 (10/21/1992)</div>
birthday = bsObj.find( "div", {"class":"age"} )
birthday.span.clear()
print(birthday.get_text()) # 23 (10/21/1992)

# <div class="hometown"><span class="category">Birthplace:</span> Barranquilla, Colombia</div>
birthplace = bsObj.find( "div", {"class":"hometown"} )
birthplace.span.clear()
print(birthplace.get_text()) # Barranquilla, Colombia

相关问题 更多 >