在python中使用pandas映射列上匹配的字数

2024-05-20 17:09:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个测向仪

Name    Step     Description
Ram        1     Ram is oNe of the good cricketer
Ram        2     gopal one
Sri        1     Sri is one of the member
Sri        2     ravi good 
Kumar      1     Kumar is a keeper
Madhu      1     good boy
Vignesh    1     oNe little
Pechi      1     one book
mario      1     good randokm
Roger      1     one milita good
bala       1     looks good
raj        1     more one
venk       1     likes good

还有一张单子

^{pr2}$

我试图从我的列表中获取至少有一个关键字的行。在

我试过了, mask=df[“说明”]。结构包含(“|”。join(my_list),na=False) 我得到了输出

Name    Description
Ram     Ram is one of the good cricketer
Sri     Sri is one of the member        

我还想在“描述”中添加关键字,并在单独的列中对其计数

当df[“Name”]不是第一次出现时,即使“Description”也包含一个关键字,它也不应该复制keys列中的关键字,我想要的输出是

我想要的输出是

 Name   Step    Description                          keys        count
 Ram     1     Ram is one of the good cricketer      one,good    2
 Ram     2     gopal one
 Sri     1     Sri is one of the member              one         1
 Sri     2     ravi good
 Kumar   1     Kumar is a keeper
 Madhu   1     good boy                              good        1
 Vignesh 1     oNe little                            oNe         1
 Pechi   1     one book                              one         1 
 mario   1     good randokm good                     good        1
 Roger   1     one milita good                       one,good    2
 bala    1     looks good                            good        1
 raj     1     more one                              one         1
 venk    1     likes good                            good        1

Tags: ofthenameisstep关键字descriptionone
1条回答
网友
1楼 · 发布于 2024-05-20 17:09:36

创建新遮罩并应用它:

my_list=["one","good"]

mask=df["Description"].str.contains("|".join(my_list),na=False,flags=re.IGNORECASE ) & \
     (df.groupby('Name').cumcount() == 0)
print (mask)
0      True
1     False
2      True
3     False
4     False
5      True
6      True
7      True
8      True
9      True
10     True
11     True
12     True
dtype: bool

^{pr2}$

编辑:

#transform all values if need same size of original
s = df.groupby('Name')['Description'].transform(','.join)
print (s)
0     Ram is oNe of the good cricketer,gopal one
1     Ram is oNe of the good cricketer,gopal one
2            Sri is one of the member,ravi good 
3            Sri is one of the member,ravi good 
4                              Kumar is a keeper
5                                       good boy
6                                     oNe little
7                                       one book
8                              good randokm good
9                                one milita good
10                                    looks good
11                                      more one
12                                    likes good
Name: Description, dtype: object

#for mask use new Series s
mask=s.str.contains("|".join(my_list),na=False,flags=re.IGNORECASE ) & \
     (df.groupby('Name').cumcount() == 0)
print (mask)
0      True
1     False
2      True
3     False
4     False
5      True
6      True
7      True
8      True
9      True
10     True
11     True
12     True
dtype: bool

#extract from new Series s
extracted = s.str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE).apply(set)
df.loc[mask, 'keys'] = extracted.str.join(',')
df.loc[mask, 'count'] = extracted.str.len()
print (df)
       Name  Step                       Description          keys  count
0       Ram     1  Ram is oNe of the good cricketer  good,oNe,one    3.0
1       Ram     2                         gopal one           NaN    NaN
2       Sri     1          Sri is one of the member      good,one    2.0
3       Sri     2                        ravi good            NaN    NaN
4     Kumar     1                 Kumar is a keeper           NaN    NaN
5     Madhu     1                          good boy          good    1.0
6   Vignesh     1                        oNe little           oNe    1.0
7     Pechi     1                          one book           one    1.0
8     mario     1                 good randokm good          good    1.0
9     Roger     1                   one milita good      good,one    2.0
10     bala     1                        looks good          good    1.0
11      raj     1                          more one           one    1.0
12     venk     1                        likes good          good    1.0

相关问题 更多 >