在没有唯一类或标识符的情况下,如何创建webscrape?

2024-10-02 04:33:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我对python非常陌生,正在尝试使用beautifulsoup tp webscrape从Wunderground.com的历史数据中获取一些平均温度数据。我已经浏览了webscraping教程,但找不到使用唯一类或id不容易访问被刮取数据的示例。我的代码如下,但在尝试访问webscraping数据时,我不知所措

我正在尝试刮取此网页的平均温度列中的值,并希望刮取表中的其他值:https://www.wunderground.com/history/daily/gb/christchurch/EGHH/date/2019-8-11

我想要的数据在一个表中,但表的每一行都有 class="ng-star-inserted" and there are 426 matches for "td.ng-star-inserted" on the page. I'm not sure how and if best to use beautifulsoup's find or find_all methods? All help appreciated, thanks.

import requests
from bs4 import BeautifulSoup

url = 'https://www.wunderground.com/history/daily/gb/christchurch/EGHH/date/2019-8-11'
response = requests.get(url) 
soup = BeautifulSoup(response.text, 'html.parser')

Tags: 数据httpscomdatewwwnghistorydaily
2条回答

尝试使用:

class="mat-cell cdk-cell cdk-column-temperature mat-column-temperature ng-star-inserted"

如果您想要刮取的元素没有classid,那么您可以使用xpath获得它

import lxml.html
import requests

url = "https://www.wunderground.com/history/daily/gb/christchurch/EGHH/date/2019-8-11"
path = "/html/body/app-root/app-history/one-column-layout/wu-header/sidenav/mat-sidenav-container/mat-sidenav-content/div/section/div[1]/lib-city-header/div[1]/div/div/a[1]/lib-display-unit/span/span[1]/text()"

response = requests.get(url)
tree = lxml.html.fromstring(response.text)
temperature = tree.xpath(path)

if temperature:
    print(temperature[0])

相关问题 更多 >

    热门问题