如何在datafram中找到另一个没有循环的相关行

2024-06-25 05:49:49 发布

您现在位置:Python中文网/ 问答频道 /正文

product_no    part_no    level  
1              1_1         1    
1              1_2         1    
1              1_3         2     
1              1_4         1     
1              1_5         1  
1              1_6         2 
1              1_7         2 
1              1_8         3 
1              1_9         3 
1              1_10        2 
2              2_1         1 
2              2_2         1 
2              2_3         2  
2              2_4         1 

在这个数据帧中,我试图编写一个函数来调用比所选行具有更低级别部分的行。例如,对于部分1_3,较低级别的部分是1_2。对于1_7,它是1_5,对于1_9,它是1_7,依此类推。我已经通过for循环进行了这个操作,但是我想知道有没有更有效的方法来完成我在这里的工作。你知道吗


Tags: 数据方法函数noforproduct级别level
1条回答
网友
1楼 · 发布于 2024-06-25 05:49:49

有一个无环的方法,但它会让你的头旋转。顺便说一句,我假设你想用product_no来分隔它,因此分组方式是:

def last_part_no(group):
    dummies = pd.get_dummies(group['level'])

    idx = dummies.index.to_series()
    last_index = dummies.apply(lambda col: idx.where(col != 0, np.nan).fillna(method='ffill'))
    last_index[0] = np.nan

    idx = last_index.lookup(last_index.index, group['level'] - 1)
    return pd.DataFrame({
        'last_prod_no': group.reindex(idx)['part_no'].values
    }, index=group.index)

df['last_part_no'] = df.groupby('product_no').apply(last_part_no)

结果:

    product_no part_no  level last_part_no
0            1     1_1      1          NaN
1            1     1_2      1          NaN
2            1     1_3      2          1_2
3            1     1_4      1          NaN
4            1     1_5      1          NaN
5            1     1_6      2          1_5
6            1     1_7      2          1_5
7            1     1_8      3          1_7
8            1     1_9      3          1_7
9            1    1_10      2          1_5
10           2     2_1      1          NaN
11           2     2_2      1          NaN
12           2     2_3      2          2_2
13           2     2_4      1          NaN

工作原理如下:

groupby将数据帧除以product_no,然后将每个子帧发送到last_part_no函数中:

    product_no part_no  level
0            1     1_1      1
...
              -
10           2     2_1      1
...

真正的工作发生在last_part_no函数内部。假设函数正在处理第一个子帧,这相当于调用:

subframe = df[df['product_no'] == 1]
last_part_no(subframe)

以下是subframe的值供您参考:

   product_no part_no  level
0           1     1_1      1
1           1     1_2      1
2           1     1_3      2
3           1     1_4      1
4           1     1_5      1
5           1     1_6      2
6           1     1_7      2
7           1     1_8      3
8           1     1_9      3
9           1    1_10      2

dummieslevel列的一种热编码形式:

   1  2  3
0  1  0  0       > this row is level 1 since the column 1 is "hot"
1  1  0  0      
2  0  1  0       > this row is level 2 since the column 2 is "hot"
3  1  0  0
4  1  0  0
5  0  1  0
6  0  1  0
7  0  0  1       > this row is level 3 since the column 3 is "hot"
8  0  0  1
9  0  1  0

接下来,我们获取dummies.index,并根据每列的“热度”对其进行更改:如果行是“热度”,则保留索引的值,否则,替换为np.nan。然后我们向前填充这些nan

index  1       np.where(...)      fillna(...)
0      1       0                  0              > as of index 0, last row with level 1 is row 0
1      1       1                  1
2      0       np.nan             1
3      1       3                  3
4      1  ==>  4             ==>  4
5      0       np.nan             4
6      0       np.nan             4
7      0       np.nan             4              > as of index 7, last row with level 1 is row 4
8      0       np.nan             4
9      0       np.nan             4

对所有3列重复此操作,您的last_index框架如下所示(列0是为了方便而创建的,全部是nan):

     0    1    2    3  
0  NaN  0.0  NaN  NaN  
1  NaN  1.0  NaN  NaN  
2  NaN  1.0  2.0  NaN  
3  NaN  3.0  2.0  NaN  
4  NaN  4.0  2.0  NaN  
5  NaN  4.0  5.0  NaN  
6  NaN  4.0  6.0  NaN  
7  NaN  4.0  6.0  7.0   > as of index 7, last row with level 1 is 4, with level 2 is 6, with level 3 is 7
8  NaN  4.0  6.0  8.0  
9  NaN  4.0  9.0  8.0   > as of index 9, last row with level 1 is 4, with level 2 is 9, with level 3 is 8

现在,让我们回到level专栏(即subframe['level'])。要查找last_part_no,请转到level - 1

    level  level-1
0       1        0
1       1        0
2       2        1
3       1        0
4       1        0
5       2        1
6       2        1
7       3        2
8       3        2
9       2        1

将它与last_index框架结合起来,您可以找到包含每行last_part_no的行的索引。这就是lookup调用的目的:

The row index of last_part_no ...                 idx
                                                   -
   for row 0 is in row 0, col 0 of last_index  > nan
           1       row 1, col 0                > nan
           2       row 2, col 1                > 1
           3       row 3, col 0                > nan
           4       row 4, col 0                > nan
           5       row 5, col 1                > 4
           6       row 6, col 1                > 4
           7       row 7, col 2                > 6
           8       row 8, col 2                > 6
           9       row 9, col 1                > 4

最后一步是按照idx中规定的顺序将part_no列变成last_part_no

   product_no part_no  level  last_part_no
0           1     1_1      1           nan
1           1     1_2      1           nan
2           1     1_3      2           1_2
3           1     1_4      1           nan
4           1     1_5      1           nan
5           1     1_6      2           1_5
6           1     1_7      2           1_5
7           1     1_8      3           1_7
8           1     1_9      3           1_7
9           1    1_10      2           1_5

相关问题 更多 >