大型档案中每个人的唯一值总数

网友

1楼 · 编辑于 2024-10-01 11:20:55

您可以利用python的基本库-collections

from collections import Counter

dict(Counter(pd.Series(['cody', 'cody ', 'cody ', 'melton', 'melton', 'harry'])))

输出

{'cody ': 2, 'melton': 2, 'cody': 1, 'harry': 1}

在我上面的例子中，我传递了一个pd.Series作为它的参数，但是在您的例子中，您可以将df['name']传递给它，它是一个pd.Series对象

网友

2楼 · 编辑于 2024-10-01 11:20:55

您没有指定源数据的格式，因此假设它是一个列表列表：

>>> data = [["cody melton", "apple", 3], ["cody melton", "banana", 5],
            ["cody melton", "banana", 7], ["larisa harris", "apple", 8],
            ["larisa harris", "apple", 5]]

当您在“香草”python中寻找性能时，请查看标准库—在本例中是^{}；我们将用它来计算所有独特的水果组合：

>>> pairs = Counter(((x[0], x[1]) for x in data))
>>> pairs
Counter({('cody melton', 'banana'): 2, ('larisa harris', 'apple'): 2, ('cody melton', 'apple'): 1})

参数是一个迭代器，它从源数据中创建一个元组(name, fruit)，并且Counter对它们的出现进行计数

编辑：如果您只想计算水果在特定集合中的那些：

fruits = set(['apple', 'banana', 'coconut'])

，然后在理解中加上这个条件：

>>> pairs = Counter(((x[0], x[1]) for x in data if x[1] in fruits))

我们就快到了-剩下的就是计算每个名字的出现次数：

>>> names = Counter((pair[0] for pair in pairs))
>>> names
Counter({'cody melton': 2, 'larisa harris': 1})
>>> dict(names)  # this is how to cast it to a regular dict
{'larisa harris': 1, 'cody melton': 2}

我看到您的输出中有一个出现次数为0的“harry barry”——它们显然没有出现在源代码data中，所以只需将它们添加到dict中，值为0

网友

3楼 · 编辑于 2024-10-01 11:20:55

就这么做吧：

xx = ['apple', 'apple', 'banana', 'coconut'];
d = dict()

for x in xx:    
    if x in d:
        d[x] += 1
    else:
        d[x] = 1


print (d)

相关问题更多 >

编程相关推荐

热门问题

热门文章

大型档案中每个人的唯一值总数

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >