用缺少的日期填充其他列

2024-10-02 18:15:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我实际上是从几个excel文件中提取数据来监控我的每日卡路里摄入量。我设法使用列表理解来生成日期。我尝试使用“合并”或“加入”,但它不起作用。 ValueError:您正在尝试合并object和float64列

date_list = ['2021-05-22','2021-05-24','2021-05-26','2021-05-27']
idx = pd.date_range(date_list[0], date_list[-1]) # To find missing dates
df_dates = pd.DataFrame(idx) # To convert list to DataFrame
df1_dates = pd.DataFrame(np.repeat(df_dates.values,2,axis=0)) # However, there is no column title and it is default at 0.

我还有另外一组关于我的卡路里摄入量和运动时间的数据

# These are Lists.
Time = [Morning, Afternoon, Morning, Afternoon, Morning, Afternoon, Morning, Afternoon]
Calories = [420,380,390,400,350,280,300,430]
Duration = [50,40,45,50,45,50,44,58]

我面临的问题是,在使用np.repeat之后,我不知道如何为df1_dates dataframe创建列标题(“日期”)。我想用“NaN”填充与缺失日期对应的其他列

输出应如下所示:

         Date       Time calories duration
0   22/5/2021    Morning      420       50
1   22/5/2021  Afternoon      380       40
2   23/5/2021    Morning      Nan      Nan
3   23/5/2021  Afternoon      Nan      Nan
4   24/5/2021    Morning      390       45
5   24/5/2021  Afternoon      400       50
6   25/5/2021    Morning      Nan      Nan
7   25/5/2021  Afternoon      Nan      Nan
8   26/5/2021    Morning      350       45
9   26/5/2021  Afternoon      280       50
10  27/5/2021    Morning      300       44
11  27/5/2021  Afternoon      430       58

Tags: to数据dataframedfdatenpnanlist
1条回答
网友
1楼 · 发布于 2024-10-02 18:15:58

使用现有数据构建数据框架,并使用缺失的日期重新为其编制索引

# Input data
date_list = ['2021-05-22','2021-05-24','2021-05-26','2021-05-27']
calories = [420,380,390,400,350,280,300,430]
duration = [50,40,45,50,45,50,44,58]

# Dataframe with sparse index
idx = pd.MultiIndex.from_product([pd.to_datetime([d for d in date_list]),
                                  ["Morning", "Afternoon"]],
                                 names=["Date", "Time"])
df = pd.DataFrame({'calories': calories, 'duration': duration}, index=idx)

# Dataframe with full index
idx1 = pd.MultiIndex.from_product([pd.date_range(date_list[0], date_list[-1]),
                                   ["Morning", "Afternoon"]],
                                  names=["Date", "Time"])
df1 = df.reindex(idx1).reset_index()
>>> df1
         Date       Time  calories  duration
0  2021-05-22    Morning     420.0      50.0
1  2021-05-22  Afternoon     380.0      40.0
2  2021-05-23    Morning       NaN       NaN
3  2021-05-23  Afternoon       NaN       NaN
4  2021-05-24    Morning     390.0      45.0
5  2021-05-24  Afternoon     400.0      50.0
6  2021-05-25    Morning       NaN       NaN
7  2021-05-25  Afternoon       NaN       NaN
8  2021-05-26    Morning     350.0      45.0
9  2021-05-26  Afternoon     280.0      50.0
10 2021-05-27    Morning     300.0      44.0
11 2021-05-27  Afternoon     430.0      58.0

相关问题 更多 >