我怎么得到下一个标签

2024-06-25 22:54:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我想知道课间的头条新闻。标题围绕着h2标签。标题在标签后面

from bs4 import BeautifulSoup
import requests
r = requests.get("https://www.dailypost.ng/hot-news")
soup = BeautifulSoup(r.content, "html.parser")
mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
mytags = mydivs.findNext('h2')
for tag in mytags:
    print(tag.text.strip())

Tags: fromimport标题gettag标签h2requests
3条回答

soup.findAll()返回一个列表(或None),因此不能对其调用findNext()。但是,您可以迭代标记并分别调用每个标记上的find_next()

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.dailypost.ng/hot-news")
soup = BeautifulSoup(r.content, "html.parser")
mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
for tag in mydivs:
    print(tag.find_next('h2').get_text(strip=True))

印刷品:

BREAKING: Another federal lawmaker dies in Dubai hospital
Cross-Over Night: Enugu Govt bans burning of tyres on roads
Dadiyata: DSS breaks silence as Nigerian govt critic remains missing
CAC: Nigerian govt appoints new Acting Registrar-General
What Buhari told me – Dabiri-Erewa
What soldiers should expect in 2020 – Buratai
Only earthquake can erase Amosun’s legacies in Ogun – Akinlade
Civil War: Militia leader sentenced to 20yrs in prison
2020: Prophet Omale releases prophecies on Buhari, Aisha, Kyari, govs, coup plot
BREAKING: EFCC arrests Shehu Sani
Armed Forces Day: Yobe Governor Buni, donates N40 million for emblem appeal fund
Zamfara govt bans illegal gathering in the state
Agbenu Kacholalo: Colours of culture at Idoma International Carnival 2019 [PHOTOS]
Men of God are too fearful, weak to challenge government activities
2020: Peter Obi sends message to Nigerians
TETFUND: EFCC, ICPC asked to probe agency over alleged corruption
Two inmates regain freedom from Uyo prison
Buhari meets President of AfDB, Adeshina at Aso Rock
New Kogi CP resumes office, promises crime free state
Nothing stops you from paying N30,000 minimum wage to workers – APC challenges Makinde

编辑:此脚本将从多个页面中删除标题:

import requests
from bs4 import BeautifulSoup

url = 'https://dailypost.ng/hot-news/page/{}/'

for page in range(1, 5):    # <  change how many pages do you want
    print('Page no.{}'.format(page))
    soup = BeautifulSoup(requests.get(url.format(page)).content, "html.parser")
    mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
    for tag in mydivs:
        print(tag.find_next('h2').get_text(strip=True))
    print('-' * 80)

必须遍历mydivs才能使用findNext()

mydivs是web元素的列表findNext只适用于单个web元素。必须遍历div并对每个div运行findNext

加上这行就行了

for div in mydivs:

把它放在

mytags = div.findNext('h2')

以下是您的工作程序的完整代码:

from bs4 import BeautifulSoup
import requests
r = requests.get("https://www.dailypost.ng/hot-news")
soup = BeautifulSoup(r.content, "html.parser")
mydivs = soup.findAll("span", {"class": "mvp-cd-date left relative"})
for div in mydivs:
    mytags = div.findNext('h2')
    for tag in mytags:
        print(tag.strip())

尝试将最后3行替换为:

for div in mydivs:
    mytags = div.findNext('h2')
    for tag in mytags:
        print(tag.strip())

相关问题 更多 >