单击groupby并检查一行的值是否在另一行的值中

2024-05-01 00:26:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我想对客户进行分组,并将计数为1的项目与计数大于1的项目进行匹配,如果所有项目都匹配,则将可能的合并id添加到新列中。例如:客户1,id=3,项目在id=2中,因此这是一个匹配,可分配的合并id为1,同样,对于客户2,id=7,项目在id=5项目中,所以匹配和可能的合并id是4。你知道吗

我的数据帧:

    count custmr    id  items
    3   Customer1   1   Cabbage, beet, Okra, root
    3   Customer1   2   Apple, Banana, Mango ,Pears, leafs
    1   Customer1   3   Mango leafs
    1   Customer1   4   tomato root
    4   Customer2   5   grapes,leach,guava,pappaya
    2   Customer2   6   blackberry,blueberry
    1   Customer2   7   pappaya

预期输出:

  count custmr     id        items                        probable_merge_id
    3   Customer1   1   Cabbage, beet, Okra, root   
    3   Customer1   2   Apple, Banana, Mango ,Pears, leafs  
    1   Customer1   3   Mango leafs                             2
    1   Customer1   4   tomato root 
    4   Customer2   5   grapes,leach,guava,pappaya  
    2   Customer2   6   blackberry,blueberry    
    1   Customer2   7   pappaya                                 4

Tags: 项目id客户countitemsroot计数beet
1条回答
网友
1楼 · 发布于 2024-05-01 00:26:47

首先按merge创建交叉联接,按count=1筛选,将字符串转换为set,以便进行比较。上次为map创建Series

df1 = df.merge(df, on='custmr')
df1 = df1[(df1['count_x'] == 1)]
df1['items_x'] = df1['items_x'].str.split('\s+|,\s*').apply(set)
df1['items_y'] = df1['items_y'].str.split('\s+|,\s*').apply(set)
df1 = df1[ df1['items_x'] < df1['items_y']]
print (df1)
    count_x     custmr  id_x         items_x  count_y  id_y  \
9         1  Customer1     3  {Mango, leafs}        3     2   
22        1  Customer2     7       {pappaya}        4     5   

                                 items_y  
9   {Mango, Pears, leafs, Apple, Banana}  
22       {grapes, pappaya, leach, guava}  

s = df1.set_index('id_x')['id_y']
print (s)
id_x
3    2
7    5
Name: id_y, dtype: int64

df['probable_merge_id'] = df['id'].map(s)
print (df)
   count     custmr  id                           items  probable_merge_id
0      3  Customer1   1          Cabbage,beet,Okra,root                NaN
1      3  Customer1   2  Apple,Banana,Mango,Pears,leafs                NaN
2      1  Customer1   3                     Mango leafs                2.0
3      1  Customer1   4                     tomato root                NaN
4      4  Customer2   5      grapes,leach,guava,pappaya                NaN
5      2  Customer2   6            blackberry,blueberry                NaN
6      1  Customer2   7                         pappaya                5.0

相关问题 更多 >