文章用靓汤刮网

url = "https://www.argusmedia.com/en/news/2214037-us-hrc-prices-rise-as-supply-remains-tight" # Request r1 = requests.get(url, verify=False) r1.status_code print(r1.status_code) # We'll save in coverpage the cover page content coverpage = r1.content # Soup creation soup1 = BeautifulSoup(coverpage, "html5lib") # News identification coverpage_news = soup1.find_all('article' , class_ ='news-container cf') len(coverpage_news) ```

3条回答

网友

1楼 · 编辑于 2024-10-05 15:24:09

这是因为这是动态加载的，所以需要直接调用API

import requests

data = requests.get('https://www.argusmedia.com/api/news/2214037/us-hrc-prices-rise-as-supply-remains-tight').json()

body = data['AmpBody']
title = data['Title']
date = data['PublishedDate']
year = data['PublishedYear']

print(body, title, date, year, sep='\n')

# <article><p class="lead">US hot-roll...
# US HRC: Prices rise as supply remains tight
# 11 May
# 2021

网友

2楼 · 编辑于 2024-10-05 15:24:09

该页面运行Java脚本。 Requests是一个http库，无法运行javascript。为了“查看”javscript网页的HTML，您需要处理页面上的所有代码并实际呈现内容。一种方法是使用requests_html模块

from requests_html import HTMLSession  

session = HTMLSession()
resp = session.get('your_url')
# this command executes the javascripts 
resp.html.render()

输出：

resp.text

{"AmpBody":"<article><p class=\\"lead\\">US hot-rolled coil (HRC) prices continued to trend upward as supplies remain tight and demand stays elevated...}

网友

3楼 · 编辑于 2024-10-05 15:24:09

从docs

如果要搜索与两个或多个CSS类匹配的标记，应使用CSS选择器：

soup1.select("article.news-container.cf")

相关问题更多 >

编程相关推荐

热门问题

热门文章