创建并用Pandas填充零行

2024-07-07 07:08:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧,“inp”和“IndiRelat”。“inp”是某些数据的条目

import pandas as pd

inp = [[2, 'cvt'  , -3,  5, 17, -2, -9, -0.2, 'RL'],
       [2, 'cv'   ,  0,  0,  0,  0,  0,    0, 'LL'],
       [2, 'sope' ,  0,  0,  0,  0,  0,    0, 'SD'],
       [2, 'wix+' ,-13,-13,  2,  1,-62, -0.5, 'WI'],
       [2, 'wix-' ,  0, 16,  6, 13,  0,  0.3, 'WI'],
       [4, 'sope' ,-42,  0, 29,  0,  0,  -13, 'SD'],
       [4, 'cv'   ,  0,  0,  0,  0,  0,    0, 'LL'],
       [4, 'cvt'  ,  0,  0,  0,  0,  0,   -1, 'RL'],
       [4, 'wix+' ,-18, -2, 19, 19,  3,  -64, 'WI'],
       [4, 'wix-' ,  0,-30, -2, -2, 32,    0, 'WI']]

inp = pd.DataFrame(data = inp, columns = ['Key','Descr', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'Indicator'])

print(inp['Key'])
print(inp['Indicator'])

IndiRelat = ['SD', 'LL', 'RL', 'SS', 'RR', 'WI', 'WI', 'WI', 'WI', 'QU', 'QU']
IndiRelat = pd.DataFrame(IndiRelat)

我已经创建了一个“inp”数据框,其中每个键“inp['key']”都有与指标“inp['indicator']”相关的数据

想法是将其与第二个数据帧“IndiRelat”关联,并在“IndiRelat”数据不在“inp['indicator']”中的情况下创建零行

我想要得到的是这样的东西

    Index   Descr   C1  C2  C3  C4  C5  C6  Indicator
0   2       cvt     -3  5   17  -2  -9  -0.2    RL
1   2       cv      0   0   0   0   0   0       LL
2   2       sope    0   0   0   0   0   0       SD
3   2       wix+    -13 -13 2   1   -62 -0.5    WI
4   2       wix-    0   16  6   13  0   0.3     WI
5   2       0       0   0   0   0   0   0       WI
6   2       0       0   0   0   0   0   0       WI
7   2       0       0   0   0   0   0   0       QU
8   2       0       0   0   0   0   0   0       QU
9   2       0       0   0   0   0   0   0       SS
10  2       0       0   0   0   0   0   0       RR
11  4       cvt    -42  0   29  0   0   -13     RL
12  4       cv      0   0   0   0   0   0       LL
13  4       sope    0   0   0   0   0   -1      SD
14  4       wix+    -18 -2  19  19  3   -64     WI
15  4       wix-    0   -30 -2  -2  32  0       WI
16  4       0       0   0   0   0   0   0       WI
17  4       0       0   0   0   0   0   0       WI
18  4       0       0   0   0   0   0   0       QU
19  4       0       0   0   0   0   0   0       QU
20  4       0       0   0   0   0   0   0       SS
21  4       0       0   0   0   0   0   0       RR

如果你能帮我提出一些想法和建议,我将不胜感激


Tags: 数据sdsscvindicatorrlpdwix
2条回答

我们尝试执行cumcount创建唯一键,然后执行reindex

inp['new'] = inp.groupby(['Key','Indicator']).cumcount()
IndiRelat[1] = IndiRelat.groupby(0).cumcount()

IndiRelat.columns = ['Indicator','new']


out = inp.set_index(['Key','Indicator','new']).unstack(level=0).reindex(pd.MultiIndex.from_frame(IndiRelat),fill_value=0).stack().reset_index().sort_values('Key')
out
Out[93]: 
   Indicator  new  Key Descr  C1  C2  C3  C4  C5    C6
0         SD    0    2  sope   0   0   0   0   0   0.0
18        QU    0    2     0   0   0   0   0   0   0.0
16        WI    3    2     0   0   0   0   0   0   0.0
14        WI    2    2     0   0   0   0   0   0   0.0
12        WI    1    2  wix-   0  16   6  13   0   0.3
20        QU    1    2     0   0   0   0   0   0   0.0
8         RR    0    2     0   0   0   0   0   0   0.0
10        WI    0    2  wix+ -13 -13   2   1 -62  -0.5
6         SS    0    2     0   0   0   0   0   0   0.0
4         RL    0    2   cvt  -3   5  17  -2  -9  -0.2
2         LL    0    2    cv   0   0   0   0   0   0.0
9         RR    0    4     0   0   0   0   0   0   0.0
5         RL    0    4   cvt   0   0   0   0   0  -1.0
11        WI    0    4  wix+ -18  -2  19  19   3 -64.0
13        WI    1    4  wix-   0 -30  -2  -2  32   0.0
3         LL    0    4    cv   0   0   0   0   0   0.0
15        WI    2    4     0   0   0   0   0   0   0.0
17        WI    3    4     0   0   0   0   0   0   0.0
1         SD    0    4  sope -42   0  29   0   0 -13.0
19        QU    0    4     0   0   0   0   0   0   0.0
7         SS    0    4     0   0   0   0   0   0   0.0
21        QU    1    4     0   0   0   0   0   0   0.0

外部合并应该起作用。根据@BENY的评论和回答,我们需要解决密钥的非唯一性问题:

>>> df2 = IndiRelat.rename(columns={0: 'Indicator'})
>>> df2['dedup_indic'] = df2.groupby('Indicator').cumcount()
>>> df = inp.join(inp.groupby('Indicator').cumcount().rename('dedup_indic'))\
...         .merge(df2, how='outer')
>>> df
    Key Descr    C1    C2    C3    C4    C5    C6 Indicator  dedup_indic
0   2.0   cvt  -3.0   5.0  17.0  -2.0  -9.0  -0.2        RL            0
1   2.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL            0
2   2.0  sope   0.0   0.0   0.0   0.0   0.0   0.0        SD            0
3   2.0  wix+ -13.0 -13.0   2.0   1.0 -62.0  -0.5        WI            0
4   2.0  wix-   0.0  16.0   6.0  13.0   0.0   0.3        WI            1
5   4.0  sope -42.0   0.0  29.0   0.0   0.0 -13.0        SD            1
6   4.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL            1
7   4.0   cvt   0.0   0.0   0.0   0.0   0.0  -1.0        RL            1
8   4.0  wix+ -18.0  -2.0  19.0  19.0   3.0 -64.0        WI            2
9   4.0  wix-   0.0 -30.0  -2.0  -2.0  32.0   0.0        WI            3
10  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        SS            0
11  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        RR            0
12  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        QU            0
13  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        QU            1
>>> df.fillna(0).drop(columns=['dedup_indic'])
    Key Descr    C1    C2    C3    C4    C5    C6 Indicator
0   2.0   cvt  -3.0   5.0  17.0  -2.0  -9.0  -0.2        RL
1   2.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL
2   2.0  sope   0.0   0.0   0.0   0.0   0.0   0.0        SD
3   2.0  wix+ -13.0 -13.0   2.0   1.0 -62.0  -0.5        WI
4   2.0  wix-   0.0  16.0   6.0  13.0   0.0   0.3        WI
5   4.0  sope -42.0   0.0  29.0   0.0   0.0 -13.0        SD
6   4.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL
7   4.0   cvt   0.0   0.0   0.0   0.0   0.0  -1.0        RL
8   4.0  wix+ -18.0  -2.0  19.0  19.0   3.0 -64.0        WI
9   4.0  wix-   0.0 -30.0  -2.0  -2.0  32.0   0.0        WI
10  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        SS
11  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        RR
12  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        QU
13  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        QU

相关问题 更多 >