从数据帧中的序列中查找缺少的编号

uniquecode1 year month Name Sale 1029 2020 5 ABC 10 1029 2020 6 ABC 20 1029 2020 10 ABC 30 1029 2020 11 ABC 35 1029 2020 12 ABC 38 1050 2020 4 DEF 39 1050 2020 5 DEF 40 1050 2020 6 DEF 31 1050 2020 7 DEF 45 1050 2020 8 DEF 55 1079 2020 4 GHI 65 1079 2021 2 GHI 75 10810 2021 1 XYZ 85

1条回答

网友

1楼 · 发布于 2024-09-24 10:29:50

您可以将带有month的year转换为datetimes，然后使用带有新0的^{}{a2}添加所有缺少的组合（对于不存在的值）和带有原始格式的^{}的^{}：

df['dates'] = pd.to_datetime(df[['year','month']].assign(day=1))

df = (df.set_index(['uniquecode1','Name', 'dates'])['Sale']
        .unstack(fill_value=0)
        .stack()
        .reset_index(name='Sale'))

print (df.head(10))
    uniquecode1 Name      dates  Sale
0          1029  ABC 2020-04-01     0
1          1029  ABC 2020-05-01    10
2          1029  ABC 2020-06-01    20
3          1029  ABC 2020-07-01     0
4          1029  ABC 2020-08-01     0
5          1029  ABC 2020-10-01    30
6          1029  ABC 2020-11-01    35
7          1029  ABC 2020-12-01    38
8          1029  ABC 2021-01-01     0
9          1029  ABC 2021-02-01     0

添加年份和月份的最后期限：

df = df.assign(year = df['dates'].dt.year, month = df['dates'].dt.month)
print (df.head())
   uniquecode1 Name      dates  Sale  year  month
0         1029  ABC 2020-04-01     0  2020      4
1         1029  ABC 2020-05-01    10  2020      5
2         1029  ABC 2020-06-01    20  2020      6
3         1029  ABC 2020-07-01     0  2020      7
4         1029  ABC 2020-08-01     0  2020      8

但不幸的是缺少09-2020，因此有必要添加^{}：

df['dates'] = pd.to_datetime(df[['year','month']].assign(day=1))
mux = pd.date_range(df['dates'].min(), df['dates'].max(), freq='MS', name='dates')

#for add maximum manaully
#mux = pd.date_range(df['dates'].min(), '2021-03-01', freq='MS', name='dates')

df = (df.set_index(['uniquecode1','Name', 'dates'])['Sale']
        .unstack(fill_value=0)
        .reindex(mux, axis=1, fill_value=0)
        .stack()
        .reset_index(name='Sale')
        )

df = df.assign(year = df['dates'].dt.year, month = df['dates'].dt.month)
print (df.head(10))
   uniquecode1 Name      dates  Sale  year  month
0         1029  ABC 2020-04-01     0  2020      4
1         1029  ABC 2020-05-01    10  2020      5
2         1029  ABC 2020-06-01    20  2020      6
3         1029  ABC 2020-07-01     0  2020      7
4         1029  ABC 2020-08-01     0  2020      8
5         1029  ABC 2020-09-01     0  2020      9
6         1029  ABC 2020-10-01    30  2020     10
7         1029  ABC 2020-11-01    35  2020     11
8         1029  ABC 2020-12-01    38  2020     12
9         1029  ABC 2021-01-01     0  2021      1

相关问题更多 >

编程相关推荐

热门问题

热门文章

从数据帧中的序列中查找缺少的编号

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >