查找两列之间的唯一值

import pandas as pd import numpy as np df = pd.read_csv('data.csv', delimiter=";") df['is_dup'] = df[['CollectedE', 'UndE']].duplicated() df['dups'] = df.groupby(['CollectedE', 'UndE']).is_dup.transform(np.sum) # df outputs: df['is_dup'] =[![enter image description here][1]][1] df[['CollectedE', 'UndE']].duplicated() df['dups'] = df.groupby(['CollectedE', 'UndE']) df

3条回答

网友

1楼 · 编辑于 2024-09-28 05:26:23

也许^{}可以帮你

网友

2楼 · 编辑于 2024-09-28 05:26:23

下面是一个使用索引差异方法和合并的工作示例

df = pd.DataFrame({'column_a':['cat','dog','bird','fish','zebra','snake'],
               'column_b':['leopard','snake','bird','sloth','elephant','dolphin']})

idx1 = pd.Index(df['column_a'])
idx2 = pd.Index(df['column_b'])

x = pd.Series(idx2.difference(idx1), name='non_matching_values')

df.merge(x, how='left', left_on='column_b', right_on=x.values)

column_a    column_b    non_matching_values
0   cat leopard leopard
1   dog snake   NaN
2   bird    bird    NaN
3   fish    sloth   sloth
4   zebra   elephant    elephant
5   snake   dolphin dolphin

网友

3楼 · 编辑于 2024-09-28 05:26:23

您可以使用isin来反转操作，这对于~非常简单

df = pd.DataFrame({'CollectedE' : ['abc@gmail.com','random@google.com'],
             'UndE' : ['abc@gmail.com','unique@googlemail.com']})

df['new_col'] = df[~df['CollectedE'].isin(df['UndE'])]['UndE']

print(df)
          CollectedE                   UndE                new_col
0      abc@gmail.com          abc@gmail.com                    NaN
1  random@google.com  unique@googlemail.com  unique@googlemail.com

相关问题更多 >

编程相关推荐

热门问题

热门文章