使用两个ID的组合识别重复ID

2024-10-06 14:30:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据集:

ID1 ID2
11  22
11  34
22  35
35  9
41  10
52  87
9   65
34  43

我想要一个使用ID1和ID2分配检测重复ID的输出数据集,如下所示:

ID1   ID2     ID3
11     22     ID_11
11     34     ID_11
22     35     ID_11
35     9      ID_11
41     10     ID_10
52     87     ID_87
9      65     ID_11
34     43     ID_1

因为ID11,22,35,9,34都是相互引用的,所以它们被映射到一个ID,比如ID t 11


Tags: 数据idid3id2id1id11
1条回答
网友
1楼 · 发布于 2024-10-06 14:30:58

您没有提供太多信息来清晰地编写此代码,但是在更改一些细节之后,希望此代码能够为您提供解决问题所需的python表达式

# your id list, as a list of lists
vars = [
  [11, 22],
  [11, 34],
  [22, 35], 
  [35, 9],  
  [41, 10],
  [52, 87],
  [9, 65],
  [34, 43]
]

# create disjoint sets
groups = []
for id_1, id_2 in vars:
  for group in groups:
    if id_1 in group or id_2 in group:
      group.add(id_1)
      group.add(id_2)
      break
  else:
    groups.append({id_1, id_2})

# map the sets to some unique id/string/whatever
id_mappings = {}
for id_counter, group in enumerate(groups):
  id_mappings[id_counter] = group

# add the unique id/string/whatever to the initial list
for id_pair in vars:
  for group_id, group in id_mappings.items():
    if id_pair[0] in group:
      id_pair.append(group_id)

for var in vars:
  print(var)
>> [11, 22, 0]
>> [11, 34, 0]
>> [22, 35, 0]
>> [35, 9, 0]
>> [41, 10, 1]
>> [52, 87, 2]
>> [9, 65, 0]
>> [34, 43, 0]

相关问题 更多 >