Web抓取:使用pandas更新/添加数据帧

2024-09-30 05:32:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用python、pandas和beauthoulsoup创建一个web抓取程序。我要它每10分钟向气象站请求风信息。此数据将存储在包含72个索引的数组中(24小时)。在

到目前为止,我已经设法用当前的条件创建了一个混乱的数据帧。我有三个问题,第三个问题可能会超出我的能力。在

1:解析时如何从我的数据中排除'/n'?在

2:如何每10分钟更新一次并添加到数组中

最新数据数组的显示方式:最新数据从最前面的数组推送到第3个。(我读过关于push和pop的文章,这可能是我将来可以研究的内容。)

这是我个人编写的第一段代码,请原谅。我在下面插入了我的代码,下面是一个显示我输出的图像的链接。 https://i.stack.imgur.com/XIN3Z.png

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://www.wunderground.com/weather/us/ca/montara'
page = requests.get(url)
page.text
soup = BeautifulSoup(page.text, 'html.parser')

#read wind speed
montara_wSpeed = []
montara_wSpeed_elem = soup.find_all(class_='wind-speed')

for item in montara_wSpeed_elem:
    montara_wSpeed.append(item.text)

#read wind direction
montara_wCompass = []
montara_wCompass_elem = soup.find_all(class_='wind-compass')

for item in montara_wCompass_elem:
    montara_wCompass.append(item.text)

#read wind station
montara_station = []
montara_station_elem = soup.find_all(class_='station-nav')

for item in montara_station_elem:
    montara_station.append(item.text)

#create dataframe
montara_array = []

for station, windCompass, windSpeed in zip(montara_station, montara_wCompass, montara_wSpeed):
    montara_array.append({'Station': station, 'Wind Direction': windCompass, 'Wind Speed': windSpeed})

df = pd.DataFrame(montara_array)
df

Tags: 数据textinimportfor数组itemwind
1条回答
网友
1楼 · 发布于 2024-09-30 05:32:35

在这里,我尝试了使用replacetime.sleepextend来实现这一点,我只修改了您的代码:

import requests
from bs4 import BeautifulSoup

url = 'https://www.wunderground.com/weather/us/ca/montara'

def scrap_weather(url):
    page = requests.get(url)
    page.text
    soup = BeautifulSoup(page.text, 'html.parser')

    #read wind speed
    montara_wSpeed = []
    montara_wSpeed_elem = soup.find_all(class_='wind-speed')

    for item in montara_wSpeed_elem:
        montara_wSpeed.append(item.text.replace("\n",""))

    # read wind direction
    montara_wCompass = []
    montara_wCompass_elem = soup.find_all(class_='wind-compass')

    for item in montara_wCompass_elem:
        montara_wCompass.append(item.text.replace("\n",""))

    #read wind station
    montara_station = []
    montara_station_elem = soup.find_all(class_='station-nav')

    for item in montara_station_elem:
        montara_station.append(item.text.replace("\n"," "))

    #create dataframe
    montara_array = []

    for station, windCompass, windSpeed in zip(montara_station, montara_wCompass, montara_wSpeed):
        montara_array.append({'Station': station, 'Wind Direction': windCompass, 'Wind Speed': windSpeed})

    return montara_array

n_times = 3
#df_new = pd.DataFrame(columns = ["Station", "Wind Direction", "Wind Speed"])
for i in range(n_times):
    montara_array = scrap_weather(url)
    if i == 0:
        data = montara_array
    else:
        montara_array.extend(data)
    print(montara_array)
    time.sleep(600)
data = pd.DataFrame(montara_array)

相关问题 更多 >

    热门问题