如何使用groupby time创建单独的数据帧

2024-10-03 21:35:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这个{a1}与数据收集超过34天,15分钟的间隔

如何从一天中的同一时间获取所有数据?我已经加载数据集并将其转换为DateTime格式

我已经获得了以下代码:

tmp=weather_sensor_df()
df=pd.DataFrame(columns=tmp.columns)
print(df)
tmp.DATE_TIME.dt.hour[13]
for i in tmp.index:
    time = tmp.DATE_TIME[i]
    if time.hour==13 and time.minute==0:
        dict={
            df.columns[0]:time,
            df.columns[1]:tmp.AMBIENT_TEMPERATURE[i],
            df.columns[2]:tmp.MODULE_TEMPERATURE[i],
            df.columns[3]:tmp.IRRADIATION[i],
        }
        df=df.append(dict,ignore_index=True)

参考:weather_sensor_df()加载天气传感器数据帧,并使用pd.DataFrame.to_datetime()DATE_TIME设置为Timestamp格式

我认为groupby()函数更适合这种情况,但我不确定如何继续

DATE_TIME,PLANT_ID,SOURCE_KEY,AMBIENT_TEMPERATURE,MODULE_TEMPERATURE,IRRADIATION
2020-05-15 00:00:00,4135001,HmiyD2TTLFNqkNe,25.184316133333333,22.8575074,0.0
2020-05-15 00:15:00,4135001,HmiyD2TTLFNqkNe,25.08458866666667,22.761667866666663,0.0
2020-05-15 00:30:00,4135001,HmiyD2TTLFNqkNe,24.935752600000004,22.59230553333333,0.0
2020-05-15 00:45:00,4135001,HmiyD2TTLFNqkNe,24.8461304,22.36085213333333,0.0
2020-05-15 01:00:00,4135001,HmiyD2TTLFNqkNe,24.621525357142858,22.165422642857145,0.0
2020-05-15 01:15:00,4135001,HmiyD2TTLFNqkNe,24.5360922,21.968570866666667,0.0
2020-05-15 01:30:00,4135001,HmiyD2TTLFNqkNe,24.638673866666664,22.352925666666668,0.0
2020-05-15 01:45:00,4135001,HmiyD2TTLFNqkNe,24.87302233333333,23.1609192,0.0
2020-05-15 02:00:00,4135001,HmiyD2TTLFNqkNe,24.936930466666663,23.026113,0.0
2020-05-15 02:15:00,4135001,HmiyD2TTLFNqkNe,25.0122476,23.343229266666665,0.0
2020-06-17 21:30:00,4135001,HmiyD2TTLFNqkNe,22.9965616,21.869773466666665,0.0
2020-06-17 21:45:00,4135001,HmiyD2TTLFNqkNe,23.137091,22.1259848,0.0
2020-06-17 22:00:00,4135001,HmiyD2TTLFNqkNe,22.563179466666668,21.164713466666665,0.0
2020-06-17 22:15:00,4135001,HmiyD2TTLFNqkNe,22.19922893333333,20.51527293333333,0.0
2020-06-17 22:30:00,4135001,HmiyD2TTLFNqkNe,22.171736666666664,21.0808288,0.0
2020-06-17 22:45:00,4135001,HmiyD2TTLFNqkNe,22.150569666666662,21.480377266666668,0.0
2020-06-17 23:00:00,4135001,HmiyD2TTLFNqkNe,22.129815666666666,21.38902386666667,0.0
2020-06-17 23:15:00,4135001,HmiyD2TTLFNqkNe,22.008274642857145,20.709211357142856,0.0
2020-06-17 23:30:00,4135001,HmiyD2TTLFNqkNe,21.96949473333333,20.7349628,0.0
2020-06-17 23:45:00,4135001,HmiyD2TTLFNqkNe,21.909287666666668,20.4279724,0.0

Tags: columns数据dataframedfdatetime格式sensor
1条回答
网友
1楼 · 发布于 2024-10-03 21:35:57
  • 使用pandas.DataFrame.groupby表示.dt.time
    • ^如果要按小时分组,可以使用{}
  • 尚未为列指定聚合函数,因此dfg是一个DataFrameGroupBy对象
  • 使用GroupBy对象,可以在isoformat(例如'hh:mm:ss')中创建数据帧的dict作为键。
    • 如果.dt.hour用于组,则删除.isoformat,并且keys将是ints0...23
import pandas as pd

# load the data
tmp = pd.read_csv('./data/Plant_1_Weather_Sensor_Data.csv')

# set the column as a datetime dtype
tmp.DATE_TIME = pd.to_datetime(tmp.DATE_TIME)

# groupby time
dfg = tmp.groupby(tmp.DATE_TIME.dt.time)

# create a dict of dataframes, where the key is an isoformat datetime.time
df_times = {g.isoformat(): data for g, data in dfg}

# display(df_times['00:15:00'].head())
              DATE_TIME  PLANT_ID       SOURCE_KEY  AMBIENT_TEMPERATURE  MODULE_TEMPERATURE  IRRADIATION
1   2020-05-15 00:15:00   4135001  HmiyD2TTLFNqkNe            25.084589           22.761668          0.0
182 2020-05-17 00:15:00   4135001  HmiyD2TTLFNqkNe            24.011531           21.648279          0.0
278 2020-05-18 00:15:00   4135001  HmiyD2TTLFNqkNe            21.041437           20.475962          0.0
374 2020-05-19 00:15:00   4135001  HmiyD2TTLFNqkNe            22.548998           20.529877          0.0
467 2020-05-20 00:15:00   4135001  HmiyD2TTLFNqkNe            22.255206           20.110174          0.0

# iterate through the dict of dataframes like a normal dict
for k, v in df_times.items():
    print(k)
    print(v.head())    

相关问题 更多 >