从元组列表python报告重复(按其索引)

2024-09-27 07:30:17 发布

您现在位置:Python中文网/ 问答频道 /正文

从元组列表项1中获取重复计数数据,该元组列表包含患者计数器datadata[1]。对于下面的示例,我不需要考虑data[0]data[2]上的重复项

import itertools
def getDuplicateinTuple(dataInput):
    seen={}
    return [seen.setdefault(t[0], t) for t in dataInput if t[0] not in seen]

data=[('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'),
('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER2'),
('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER3'),
('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER4'),
('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1'),
('2013 Jul  5 06:56:11:', 'PATIENT:COUNTER5')]

data1=[('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER1', 'COUNTER INFO: : 500 '), 
('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER2', 'COUNTER INFO: : 500 '), 
('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER3', 'COUNTER INFO: : 100 '), 
('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER4', 'COUNTER INFO: : 100 ')]

s=getDuplicateinTuple(data)
print s
s1=getDuplicateinTuple(data1)
print s1

预期输出为:

 [('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'), ('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1')]

实际输出为

[('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'), ('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1'), ('2013 Jul  5 06:56:11:', 'PATIENT:COUNTER5')]

在相同的条件下,如果我给出一个非重复的输出,如data1

预期产量:

 []

但电流输出:

[('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER1', 'COUNTER INFO: : 500 ')]

仅仅通过比较列表就可以做到这一点。 实现这一目标的更好的建议方法是什么?你知道吗

我看到了一些关于这方面的好文章: Find and list duplicates in a list?


Tags: ininfo列表datacounterjul元组seen
2条回答

您可以创建一个(默认)字典来统计出现次数,然后过滤掉少于一次的出现次数:

from collections import defaultdict
d = defaultdict(list)
for timestamp, counter in data:
    d[counter].append(timestamp)

for counter, timestamps in d.items():
    if len(timestamps) > 1:
        print([(t, counter) for t in timestamps])

使用^{}

from collections import defaultdict

def getDuplicateinTuple(dataInput):
    d = defaultdict(list)
    for t in dataInput:
        item1 = t[1]
        d[item1].append(t)
    return [t for ts in d.itervalues() if len(ts) > 1 for t in ts]

data = [
    ('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'),
    ('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER2'),
    ('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER3'),
    ('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER4'),
    ('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1'),
    ('2013 Jul  5 06:56:11:', 'PATIENT:COUNTER5')
]

data1 = [
    ('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER1', 'COUNTER INFO: : 500 '), 
    ('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER2', 'COUNTER INFO: : 500 '), 
    ('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER3', 'COUNTER INFO: : 100 '), 
    ('2013 Jul  5 04:26:40:', 'PATIENT:COUNTER4', 'COUNTER INFO: : 100 ')
]

print getDuplicateinTuple(data)
# => [('2013 Jul  5 06:56:07:', 'PATIENT:COUNTER1'),
#     ('2013 Jul  5 06:57:11:', 'PATIENT:COUNTER1')]
print getDuplicateinTuple(data1)
# => []

相关问题 更多 >

    热门问题