为什么我不能从地下天气中提取我的目标数据？

import scrapy class SpSpider(scrapy.Spider): name = 'sp' start_urls = ['http://https://www.wunderground.com/hourly/ir/tehran/date/2021-04-14/'] def parse(self, response): time = response.css('span.ng-star-inserted').extract()

2条回答

网友

1楼 · 编辑于 2024-09-27 02:27:36

对于初学者来说可能有点复杂，但没关系

您要查找的数据通过XHR请求发送。（F12->；网络XHR）。您发出的请求仅返回将包含数据的html标记

在下面的代码中，我使用的url取自XHR选项卡。所以我对这个url进行了查询。它返回一个JSON响应。然后，我将这个JSON响应（很容易被Python中的字典类型包含）转换成一个数据框架

请注意，查询获得的响应包含可用天数的“所有”小时预测（相当于单击网页上的左箭头和右箭头时）

import requests as rq 
import pandas as pd

headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0"}
url = "https://api.weather.com/v3/wx/forecast/hourly/15day?apiKey=6532d6454b8aa370768e63d6ba5a832e&geocode=35.696,51.401&units=e&language=en-US&format=json"
resp = rq.get(url,  headers=headers).json()

resp.keys() ## pour observer

df = pd.DataFrame.from_dict(resp) # JSON to DF
df["validTimeLocal"] = pd.to_datetime(df["validTimeLocal"], infer_datetime_format=True) # object type to datetime type
df.sort_values(["validTimeLocal"], ascending=True, inplace=True) # sort the df by datetimes

sub_df = df[["validTimeLocal", "temperature", "precipChance"]] # select variables you want
print(sub_df.iloc[20:25]) ## print some, and compare to the website

对粗体中的单词进行一些研究，以取得进展。还要看一下请求和bs4包

注意：该url包含特定于德黑兰研究的参数：地理编码等

网友
2楼 · 编辑于 2024-09-27 02:27:36

要获得第一次，如果您只需要它，请使用css定位器：
.mat-row:nth-of-type(1)>.cdk-column-timeHour>span
第二：
.mat-row:nth-of-type(2)>.cdk-column-timeHour>span
等等

相关问题更多 >

编程相关推荐

热门问题

热门文章