从H2字段提取文本的漂亮汤

2024-09-30 18:15:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我在试着弄明白我怎样才能弄到一些漂亮的汤

下面是我正在使用的示例代码,但现在它又回来了

---> 66             dista = soup.find('h2', {'class': 'RaceHeader_title_1Yk'}).text
     67             dista = dista.split( " " )[-1]
     68             horses = soup.findAll('div', {'class': 'Entries_entry_2Xt'})

AttributeError: 'NoneType' object has no attribute 'text 

下面是我正在使用的代码和它正在刮取的一个示例,理想情况下,我尝试将“1600”作为输出

 dista = soup.find('h2', {'class': 'RaceHeader_title_1Yk'}).text
 dista = dista.split( " " )[-1]


<h2 class="RaceHeader_title_1Yk">
<span class="RaceHeader_titleNumber_uNI">R1</span>
"MT SOMERS HONEY MAIDEN 1600"
"1600"
</h2>

Tags: 代码textdiv示例titleh2findclass
2条回答

您可以尝试以下方法:

import requests
from bs4 import BeautifulSoup as bs

# URL to be scrapped
link = "https://new.tab.co.nz/extended-form/2020-09-18-m6-r1"

# Sending a get request to get the content of page
source = requests.get(link).text

# Parsing with help of bs4
soup = bs(source,"html.parser")

# Extracting the specific element from bs4 object
content = soup.find('h2', {'class': 'RaceHeader_title_1Yk'})

# Getting the desired content
result = content.text.split(" ")[-1]

print(result)

输出

1600

试试这个:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://new.tab.co.nz/extended-form/2020-09-18-m6-r1").text
soup = BeautifulSoup(page, "html.parser")
print(soup.find("h2", {"class": "RaceHeader_title_1Yk"}).text.split()[-1])

输出1600

要获得所有马匹,请添加以下行:

print([h.text for h in soup.find_all("span", {"class": "EntryHeader_runner_UwW"})])

输出:

['Danny Green (8) 5 g bay', 'Eisenhower (10) 5 g bay', 'On The Rivet (13) 4 g bay', 'Point Break (11) 4 g brown', 'Magie Noire (7) 4 g bay', 'Mazzoni (12) 7 g bay', 'Miss Oaks (3) 5 m bay', 'Turn Your Eyes (6) 5 m chestnut', 'Repulse (5) 4 m bay', 'Spindleshanks (9) 5 m bay', 'Nifty (1) 6 m chestnut', 'Tennessee Rock (14) 4 m bay', 'Wendy Darling (4) 4 m brown', "Tappy's Lad (2) 3 g brown"]

相关问题 更多 >