从数据框列中的列表中查找元素(列的类型为列表)

2024-05-09 23:18:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我有如下列表和数据框,问题是我想搜索列表中的每个元素,如果它们存在于列中的每个列表中,则添加一个新列并将单词放入新列中。我试过了,但我的解决方案不正确。有人能帮我吗

The List : 
list_m = ['KathyConWom',
 'monkeyhead78',
 'acorncarver',
 'bonglez',
 '9NewsQueensland',
 'paulinedaniels',
 'AdvoBarryRoux',
 '_sara_jade_',
 'theage',
 'gaskell_mike',
 'saidtarraf',
 'BroHilderchump',
 'jodyvance',
 'COdendahl',
 'pfizer',
 'RobertKennedyJr',
 'Real_Sheezzii',
 'Kellie_Martin',
 'ThatsOurWaldo',
 'SCN_Nkosi',
 'azsweetheart013']

数据帧名称:test

     user_id                  text                                              tweet_id                 user_name                                mention                   
22  1334471712528855040     @KathyConWom @JamesDelingpole Time to stand-up...   1362119551375314948         @KYourrights                        [KathyConWom, JamesDelingpole]  
23  334131548               @KathyConWom @Exp_Sec_Prof It seems like weste...   1362096715877212161         @GowTolson                          [KathyConWom, Exp]  
24  1252182507715526657     @KathyConWom I guess that the hard part would ...   1362096654514552837         @Peterpu52451065                    [KathyConWom]   

我想要的是:

     user_id                  text                                              tweet_id                 user_name                                mention                                new_col                    
22  1334471712528855040     @KathyConWom @JamesDelingpole Time to stand-up...   1362119551375314948         @KYourrights                        [KathyConWom, JamesDelingpole]          KathyConWom 
23  334131548               @KathyConWom @Exp_Sec_Prof It seems like weste...   1362096715877212161         @GowTolson                          [KathyConWom, Exp]                      KathyConWom
24  1252182507715526657     @KathyConWom I guess that the hard part would ...   1362096654514552837         @Peterpu52451065                    [azsweetheart013]                       azsweetheart013
    

我尝试的是:

for index, row in df.iterrows():
  for i in list_m:
    i in test.mention
    test["c"] = i

test

Tags: 数据textnameintestid列表list
3条回答

您还可以使用^{}以列表格式获取唯一的交叉点,如下所示:

import numpy as np

df['new_col'] = df['mention'].map(lambda x: np.intersect1d(x, list_m))

如果要将列表转换为逗号分隔的字符串,只需将其与^{}链接,如下所示:

import numpy as np

df['new_col'] = df['mention'].map(lambda x: np.intersect1d(x, list_m)).str.join(', ')

您也可以在^{}中简单地使用列表理解,如下所示:

df['new_col'] = df['mention'].apply(lambda x: [y for y in x if y in list_m]).str.join(', ')

您可以使用setintersection操作来查找两个列表的公共部分

df['new_col'] = df['mention'].apply(lambda mentions: list(set(mentions).intersection(list_m)))

要将列表转换为字符串,可以使用

df['new_col'] = df['mention'].apply(lambda mentions: ', '.join(set(mentions).intersection(list_m)))

试试这个

def add(x):                                                            
    ret = ''                                                           
    for y in x:                           
        if y in list_m:
            if len(ret) > 0:
                ret += ','
            ret += y
    return ret
    
df['new_col'] = df['mention'].apply(lambda x: add(x))

相关问题 更多 >