结构包含唯一精确的值

2024-06-26 10:10:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下清单:

personnages = ['Stanley','Kevin', 'Franck']

我想用结构包含函数创建新的数据帧df3:

df3 = df2[df2['speaker'].str.contains('|'.join(personnages))]

但是,如果列speaker的行包含:“Stanley&Kevin”,我不希望它出现在df3中。你知道吗

如何改进代码来实现这一点?你知道吗


Tags: 数据函数代码结构df2joincontainskevin
2条回答

我要做的是:

# toy data
df =  pd.DataFrame({'speaker':['Stanley & Kevin', 'Everybody', 
                               'Kevin speaks', 'The speaker is Franck', 'Nobody']})

personnages = ['Stanley','Kevin', 'Franck']

pattern = '|'.join(personnages)
s = (df['speaker'].str
       .extractall(f'({pattern})')  # extract all personnages
       .groupby(level=0)[0]         # group by df's row
       .nunique().eq(1)             # count the unique number
    )
df.loc[s.index[s]]

输出:

                 speaker
2           Kevin speaks
3  The speaker is Franck

您需要在regex中表示行的开始和结束,这样它只包含一个名称:

import pandas as pd

speakers = ['Stanley', 'Kevin', 'Frank', 'Kevin & Frank']
df = pd.DataFrame([{'speaker': speaker} for speaker in speakers])
         speaker
0        Stanley
1          Kevin
2          Frank
3  Kevin & Frank


r = '|'.join(speakers[:-1]) # gets all but the last one for the sake of example

# the ^ marks start of string, and $ is the end
df[df['speaker'].str.contains(f'^({r})$')]
   speaker
0  Stanley
1    Kevin
2    Frank

相关问题 更多 >