通过从包含列表的列中删除重复项来筛选数据帧

2024-09-27 09:21:53 发布

您现在位置:Python中文网/ 问答频道 /正文

Dataframe列在列表中包含字符串值。Dataframe需要转换为在“Final”列中具有唯一列表的行

我的数据框架如下所示

    string1           string2           Final
1   [abc,ncx]       [qwe, rty]        [apple, mango]
2   [uio,pas,dfg]   [zxc,vbg,dfv]     [banana,grapes, apple]
3   [ncx,abc]       [rty,qwe]         [mango,apple]
4   [uio,pas,dfg]   [zxc,vbg,dfv]     [banana,grapes, apple]
5   [uio,dfg]        [zxc,dfv]        [banana, apple]
6   [ncx,abc]       [rty,qwe]         [mango,apple]

df['final']列必须删除重复列表,并将数据帧转换为在'final'列中包含唯一列表

所需输出数据帧:

     string1           string2           Final
1   [abc,ncx]       [qwe, rty]        [apple, mango]
2   [uio,pas,dfg]   [zxc,vbg,dfv]     [banana,grapes, apple]
3   [ncx,abc]       [rty,qwe]         [mango,apple]
4   [uio,dfg]        [zxc,dfv]        [banana, apple]

Tags: 数据apple列表uiopasfinalbananaabc
1条回答
网友
1楼 · 发布于 2024-09-27 09:21:53

通过^{}创建的~反转掩码,但由于list不可散列,请首先将它们转换为元组并在^{}中进行筛选:

df = df[~df['Final'].apply(tuple).duplicated()]
print (df)
         string1        string2                    Final
1      [abc,ncx]      [qwe,rty]           [apple, mango]
2  [uio,pas,dfg]  [zxc,vbg,dfv]  [banana, grapes, apple]
3      [ncx,abc]      [rty,qwe]           [mango, apple]
5      [uio,dfg]      [zxc,dfv]          [banana, apple]

如果apple, mangomango, apple重复(顺序不重要),则将tuple更改为frozenset

df = df[~df['Final'].apply(frozenset).duplicated()]
print (df)
         string1        string2                    Final
1      [abc,ncx]      [qwe,rty]           [apple, mango]
2  [uio,pas,dfg]  [zxc,vbg,dfv]  [banana, grapes, apple]
5      [uio,dfg]      [zxc,dfv]          [banana, apple]

相关问题 更多 >

    热门问题