在一列中查找重复项,返回唯一项并在python中列出另一列中对应的值

2024-09-28 22:35:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我想删除第1列中的重复项,并在第2列中使用python返回与每个惟一项相关联的值的相关列表。你知道吗

输入为

1 2
Jack London 'Son of the Wolf'
Jack London 'Chris Farrington'
Jack London 'The God of His Fathers'
Jack London 'Children of the Frost'
William Shakespeare  'Venus and Adonis' 
William Shakespeare 'The Rape of Lucrece'
Oscar Wilde 'Ravenna'
Oscar Wilde 'Poems'

而输出应该是

1 2
Jack London 'Son of the Wolf, Chris Farrington, Able Seaman, The God of His Fathers,Children of the Frost'
William Shakespeare 'The Rape of Lucrece,Venus and Adonis' 
Oscar Wilde 'Ravenna,Poems'

其中第二列包含与每个项相关联的值的总和。 我在dictionary上尝试了set()函数

dic={'Jack London': 'Son of the Wolf', 'Jack London': 'Chris Farrington', 'Jack London': 'The God of His Fathers'}
set(dic)

但它只返回字典的第一个键

set(['Jack London'])

Tags: oftheshakespearechrisjacklondonwilliamoscar
2条回答

在Python中,字典每个键只能包含一个值。但该值可以是项目的集合:

>>> d = {'Jack London': ['Son of the Wolf', 'Chris Farrington']}
>>> d['Jack London']
['Son of the Wolf', 'Chris Farrington']

要从键值对序列构造这样的字典,可以执行以下操作:

dct = {}
for author, title in items:
    if author not in dct:
        # Create a new entry for the author
        dct[author] = [title]
    else:
        # Add another item to the existing entry
        dct[author].append(title)

循环体可以变得更简洁,如下所示:

dct = {}
for author, title in items:
    dct.setdefault(author, []).append(title)

您应该使用itertools.groupby,因为您的列表已排序。你知道吗

rows = [('1', '2'),
        ('Jack London', 'Son of the Wolf'),
        ('Jack London', 'Chris Farrington'),
        ('Jack London', 'The God of His Fathers'),
        ('Jack London', 'Children of the Frost'),
        ('William Shakespeare', 'Venus and Adonis'),
        ('William Shakespeare', 'The Rape of Lucrece'),
        ('Oscar Wilde', 'Ravenna'),
        ('Oscar Wilde', 'Poems')]
# I'm not sure how you get here, but that's where you get

from itertools import groupby
from operator import itemgetter

grouped = groupby(rows, itemgetter(0))
result = {group:', '.join([value[1] for value in values]) for group, values in grouped}

这将为您提供以下结果:

In [1]: pprint(result)
{'1': '2',
 'Jack London': 'Son of the Wolf, Chris Farrington, The God of His Fathers, '
                'Children of the Frost',
 'Oscar Wilde': 'Ravenna, Poems',
 'William Shakespeare': 'Venus and Adonis, The Rape of Lucrece'}

相关问题 更多 >