Pandas为一年中的每个月扩展数据框架

2024-10-02 10:21:43 发布

您现在位置:Python中文网/ 问答频道 /正文

#输入数据框在下面


SKU_ID | SUPPLIER | YEAR | MONTH | UOM | VOLUME | COST 
------------------------------------------------------
SKU_123a | Supplier_8 | 2020 | 2 | Case | 12 | $101.36 
SKU_123a | Supplier_8 | 2020 | 3 | Case | 72 | $101.36 
SKU_123a | Supplier_8 | 2020 | 6 | Case | 36 | $101.36 
SKU_123a | Supplier_8 | 2020 | 7 | Case | 36 | $101.36 
SKU_123a | Supplier_8 | 2020 | 9 | Case | 36 | $101.36 
SKU_123a | Supplier_8 | 2020 | 10 | Case | 72 | $101.36
SKU_456b | Supplier_21 | 2020 | 4 | Case | 32 | $10.36 
SKU_456b | Supplier_21 | 2020 | 5 | Case | 2 | $10.36 

下面的输出数据框:对于“月”列中没有数据,卷和成本列应为空。对于多个SKU_ID

SKU_ID | SUPPLIER | YEAR | MONTH | UOM | VOLUME | COST
-------------------------------------------------------
SKU_123a | Supplier_8 | 2020 | 1 | Case |        | 
SKU_123a | Supplier_8 | 2020 | 2 | Case | 12 | $101.36 
SKU_123a | Supplier_8 | 2020 | 3 | Case | 72 | $101.36 
SKU_123a | Supplier_8 | 2020 | 4 | Case |     |     
SKU_123a | Supplier_8 | 2020 | 5 | Case |      |    
SKU_123a | Supplier_8 | 2020 | 6 | Case | 36 | $101.36 
SKU_123a | Supplier_8 | 2020 | 7 | Case | 36 | $101.36 
SKU_123a | Supplier_8 | 2020 | 8 | Case |     |     
SKU_123a | Supplier_8 | 2020 | 9 | Case | 36 | $101.36 
SKU_123a | Supplier_8 | 2020 | 10 | Case | 72 | $101.36 
SKU_123a | Supplier_8 | 2020 | 11 | Case |     | 
SKU_123a | Supplier_8 | 2020 | 12 | Case |     | 
SKU_456b | Supplier_21 | 2020 | 1 | Case |       | 
SKU_456b | Supplier_21 | 2020 | 2 | Case |       | 
SKU_456b | Supplier_21 | 2020 | 3 | Case |       | 
SKU_456b | Supplier_21 | 2020 | 4 | Case | 32 | $10.36 
SKU_456b | Supplier_21 | 2020 | 5 | Case | 2 | $10.36 
SKU_456b | Supplier_21 | 2020 | 6 | Case |       | 
SKU_456b | Supplier_21 | 2020 | 7 | Case |       | 
SKU_456b | Supplier_21 | 2020 | 8 | Case |       | 
SKU_456b | Supplier_21 | 2020 | 9 | Case |       | 
SKU_456b | Supplier_21 | 2020 | 10 | Case |          | 
SKU_456b | Supplier_21 | 2020 | 11 | Case |          | 
SKU_456b | Supplier_21 | 2020 | 12 | Case |          | 

cmcxm


Tags: 数据idyear成本suppliercasecostvolume
2条回答

创建带有月份编号的tempoary数据框:

df2 = pd.DataFrame(range(1, 13), columns=["MONTH"])

并填写以下列:

cols = ["SKU_ID" , "SUPPLIER", "YEAR" , "UOM"]
for col in cols:
    df2[col] = df[col][0]

并在MONTH上将旧数据帧合并到它

df2.merge(df, how="left", on="MONTH")

输出:

    MONTH   SKU_ID_x    SUPPLIER_x  YEAR_x  UOM_x   SKU_ID_y    |   SUPPLIER_y  |.1 YEAR_y  |.2 |.3 UOM_y   |.4 VOLUME  |.5 COST
0   2   SKU_123a    Supplier_8  2020    Case    SKU_123a    |   Supplier_8  |   2020    |   |   Case    |   12  |   $101.36
1   2   SKU_123a    Supplier_8  2020    Case    SKU_123a    |   Supplier_8  |   2020    |   |   Case    |   12  |   $101.36
2   2   SKU_123a    Supplier_8  2020    Case    SKU_123a    |   Supplier_8  |   2020    |   |   Case    |   12  |   $101.36
3   2   SKU_123a    Supplier_8  2020    Case    SKU_123a    |   Supplier_8  |   2020    |   |   Case    |   12  |   $101.36
4   2   SKU_123a    Supplier_8  2020    Case    SKU_123a    |   Supplier_8  |   2020    |   |   Case    |   12  |   $101.36
5   2   SKU_123a    Supplier_8  2020    Case    SKU_123a    |   Supplier_8  |   2020    |   |   Case    |   12  |   $101.36
6   2   SKU_123a    Supplier_8  2020    Case    SKU_123a    |   Supplier_8  |   2020    |   |   Case    |   12  |   $101.36
7   2   SKU_123a    Supplier_8  2020    Case    SKU_123a    |   Supplier_8  |   2020    |   |   Case    |   12  |   $101.36
8   2   SKU_123a    Supplier_8  2020    Case    SKU_123a    |   Supplier_8  |   2020    |   |   Case    |   12  |   $101.36
9   2   SKU_123a    Supplier_8  2020    Case    SKU_123a    |   Supplier_8  |   2020    |   |   Case    |   12  |   $101.36
10  2   SKU_123a    Supplier_8  2020    Case    SKU_123a    |   Supplier_8  |   2020    |   |   Case    |   12  |   $101.36
11  2   SKU_123a    Supplier_8  2020    Case    SKU_123a    |   Supplier_8  |   2020    |   |   Case    |   12  |   $101.36

                                        

您可以尝试:

df1 = df.set_index('MONTH').reindex(range(1,13)).reset_index()
cols = ['MONTH', 'SKU_ID', 'SUPPLIER', 'YEAR', 'UOM']
df1[cols] = df1[cols].ffill().bfill(downcast = 'infer')

输出:

    MONTH     SKU_ID      SUPPLIER  YEAR     UOM  VOLUME       COST
0       1  SKU_123a    Supplier_8   2020   Case      NaN        NaN
1       2  SKU_123a    Supplier_8   2020   Case     12.0   $101.36 
2       3  SKU_123a    Supplier_8   2020   Case     72.0   $101.36 
3       4  SKU_123a    Supplier_8   2020   Case      NaN        NaN
4       5  SKU_123a    Supplier_8   2020   Case      NaN        NaN
5       6  SKU_123a    Supplier_8   2020   Case     36.0   $101.36 
6       7  SKU_123a    Supplier_8   2020   Case     36.0   $101.36 
7       8  SKU_123a    Supplier_8   2020   Case      NaN        NaN
8       9  SKU_123a    Supplier_8   2020   Case     36.0   $101.36 
9      10  SKU_123a    Supplier_8   2020   Case     72.0    $101.36
10     11  SKU_123a    Supplier_8   2020   Case      NaN        NaN
11     12  SKU_123a    Supplier_8   2020   Case      NaN        NaN

更新答案:


multi_index = pd.MultiIndex.from_product([df['SUPPLIER'].unique(), range(1,13)], names=['SUPPLIER', 'MONTH'])
df1 = df.set_index(['SUPPLIER','MONTH']).reindex(multi_index).reset_index()
cols = ['MONTH', 'SKU_ID', 'SUPPLIER', 'YEAR', 'UOM']
df1[cols] = df1[cols].ffill().bfill(downcast = 'infer')

输出:

         SUPPLIER  MONTH     SKU_ID  YEAR     UOM  VOLUME       COST
0     Supplier_8       1  SKU_123a   2020   Case      NaN        NaN
1     Supplier_8       2  SKU_123a   2020   Case     12.0   $101.36 
2     Supplier_8       3  SKU_123a   2020   Case     72.0   $101.36 
3     Supplier_8       4  SKU_123a   2020   Case      NaN        NaN
4     Supplier_8       5  SKU_123a   2020   Case      NaN        NaN
5     Supplier_8       6  SKU_123a   2020   Case     36.0   $101.36 
6     Supplier_8       7  SKU_123a   2020   Case     36.0   $101.36 
7     Supplier_8       8  SKU_123a   2020   Case      NaN        NaN
8     Supplier_8       9  SKU_123a   2020   Case     36.0   $101.36 
9     Supplier_8      10  SKU_123a   2020   Case     72.0    $101.36
10    Supplier_8      11  SKU_123a   2020   Case      NaN        NaN
11    Supplier_8      12  SKU_123a   2020   Case      NaN        NaN
12   Supplier_21       1  SKU_123a   2020   Case      NaN        NaN
13   Supplier_21       2  SKU_123a   2020   Case      NaN        NaN
14   Supplier_21       3  SKU_123a   2020   Case      NaN        NaN
15   Supplier_21       4  SKU_456b   2020   Case     32.0    $10.36 
16   Supplier_21       5  SKU_456b   2020   Case      2.0    $10.36 
17   Supplier_21       6  SKU_456b   2020   Case      NaN        NaN
18   Supplier_21       7  SKU_456b   2020   Case      NaN        NaN
19   Supplier_21       8  SKU_456b   2020   Case      NaN        NaN
20   Supplier_21       9  SKU_456b   2020   Case      NaN        NaN
21   Supplier_21      10  SKU_456b   2020   Case      NaN        NaN
22   Supplier_21      11  SKU_456b   2020   Case      NaN        NaN
23   Supplier_21      12  SKU_456b   2020   Case      NaN        NaN

相关问题 更多 >

    热门问题