Python BeautifulSoup正在抓取Div span和p标记，以及如何在Div nam上获得精确匹配

<div class="row-table details -bp30"> <div class="col"> Name: Alisson Ramses Becker Date of birth:02/10/1992 Place of birth: Brazil </div> <div class="col"> Club: LiverpoolSquad: 13 Position: Goal Keeper </div> </div>

[Player Details, Name: Alisson Ramses Becker, Date of birth:02/10/1992, Place of birth: Brazil, Club: Liverpool, Squad: 13, Position: Goal Keeper]

2条回答

网友

1楼 · 编辑于 2024-06-17 16:23:12

您的主要问题是如何从中提取文本，它不包含。在

NavigableString一个字符串对应于标记中的一位文本。因此，如果文本是NavigableString的实例，则可以提取文本

from bs4 import BeautifulSoup,NavigableString
html = "your example"

soup = BeautifulSoup(html,"lxml")
for e in soup.find("p"):
    print(e,type(e))
#Name:  <class 'bs4.element.NavigableString'>
#<strong><span itemprop="name">Alisson Ramses Becker</span></strong> <class 'bs4.element.Tag'>

真实代码：

^{pr2}$
等于
[element for result in resultset for element in result if isinstance(element, NavigableString)]
我的完整测试代码
from bs4 import BeautifulSoup,NavigableString html = """ <div class="row-table details -bp30"> <div class="col"> Name: Alisson Ramses Becker Date of birth:02/10/1992 Place of birth: Brazil </div> <div class="col"> Club: LiverpoolSquad: 13 Position: Goal Keeper </div> </div> """ soup = BeautifulSoup(html,"lxml") resultset = soup.find_all("p") fr = [element for result in resultset for element in result if isinstance(element, NavigableString)] spanset = [e.text for e in soup.find_all("span",{"itemprop":True})] setA = ["".join(z) for z in zip(fr,spanset)] final = setA + fr[len(spanset):] print(final)
输出
['Name: Alisson Ramses Becker', 'Date of birth:02/10/1992', 'Place of birth: Brazil', 'Club: Liverpool', 'Squad: 13', 'Position: Goal Keeper']

网友
2楼 · 编辑于 2024-06-17 16:23:12

假设您有权放弃此站点，并且没有API或json返回，一种缓慢的方法是：
from bs4 import BeautifulSoup as bs html = ''' <div class="row-table details -bp30"> <div class="col"> Name: Alisson Ramses Becker Date of birth:02/10/1992 Place of birth: Brazil </div> <div class="col"> Club: LiverpoolSquad: 13 Position: Goal Keeper </div> </div> ''' soup = bs(html,'html5lib') data = [d.find_all('p') for d in soup.find_all('div',{'class':'col'})] value = [] for i in data: for j in i: value.append(j.text) print(value)

相关问题更多 >

编程相关推荐

热门问题

热门文章