匹配列值并用“”替换重复项

2024-07-01 07:19:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我有如下所示的3列,在pandas dataframe中有标题screen_name, screen_name_retweet , screen_name_mention。你知道吗

screenName      screen_name_retweet     screen_name_mention
User1                 User10                      User1
User4                 User10                      User5
User3                 User3                       User12
User6                 User10                      User7

我想要的是将screen_namescreen_name_retweetscreen_name_mention匹配,如果在screen_name and screen_name_retweet or screen_name_mention之间发现重复项,则将该列(screen_name_retweet and screen_name_mention)值替换为''。所以上面的列应该是这样的

 screenName     screen_name_retweet     screen_name_mention
    User1                 User10                      
    User4                 User10                      User5
    User3                                             User12
    User6                 User10                      User7

我怎样才能得到想要的答案?你知道吗

更新日期:

我已经试过了:

df.loc[(df['screenName'].duplicated() & df['screen_name_mention'].duplicated()), ['screen_name_mention']] = ''

但什么都没发生,桌子也没变


Tags: andnamedfscreenretweetuser1user6user7
2条回答

使用replace方法

import pandas as pd
df = pd.read_csv(file_name)          #read your file as dataframe
for index, row in df.iterrows():
    if row[0]==row[1]:
        df['screen_name_retweet'].replace(row[1], "", inplace = True)
    if row[0] == row[2]:
        df['screen_name_mention'].replace(row[2], "", inplace = True)
print(df)          
import pandas as pd
a = pd.DataFrame([["user1","user10","user1"],
                  ["user4","user10","user5"],
                  ["user3","user3","user12"]] ,
                  columns=["i1","i2","i3"]) #simplified input dataframe
for i in a.index:
    m = a.loc[i].duplicated() #mask array for each rows
    a.loc[i] = a.loc[i].mask(m).fillna("") #filter duplicates and fill by empty string

我认为这个解决方案可以从性能的角度进行改进,但它是有效的。你知道吗

相关问题 更多 >

    热门问题