如何根据条件计算列中值的频率？

TaskId | Attr. 1 | Attr. 2 | Attr. 3 123 23 twothree xyx 123 23 four lor 456 23 four pop 123 23 twothree xyx 352 34 some lkj

import csv from collections import Counter from itertools import imap from operator import itemgetter with open('task.csv') as f: data = csv.reader(f) for row in data: if row[0] == '123': cn = Counter(imap(itemgetter(2), row)) for t in cn.iteritems(): print("{} appears {} times".format(*t))

3条回答

网友

1楼 · 编辑于 2024-06-13 17:12:00

如果你不想使用熊猫，可以用字典轻松地做到：

import csv
from tabulate import tabulate

uniquekeys = {}

with open('data') as f:
    data = csv.reader(f)
    next(data, None)  # skip the headers
    for row in data:
        key = str(row[0]+":"+row[1])
        uniquekeys[key] = uniquekeys.get(key, 0) + 1
print(uniquekeys)

或者，也可以不用python轻松完成：

cat data |awk  -F',' 'NR > 1{print $1":"$2}'|sort|uniq -c

网友

2楼 · 编辑于 2024-06-13 17:12:00

使用熊猫可能更快：

import pandas as pd
df = pd.read_csv('task.csv') # open the file
df['count'] = 0 # add an extra column to count group value occurrences
counts = df.groupby(by = ['TaskId','Attr. 1','Attr. 2','Attr. 3'], as_index = False, sort = False).count() # counts non blank values of the group
display(counts) # shows you the output

网友

3楼 · 编辑于 2024-06-13 17:12:00

可以使用collections.defaultdict创建嵌套字典：

from io import StringIO
import csv
from collections import defaultdict

mystr = StringIO("""TaskId,Attr. 1,Attr. 2,Attr. 3
123,23,twothree,xyx
123,23,four,lor
456,23,four,pop
123,23,twothree,xyx
352,34,some,lkj""")

d = defaultdict(lambda: defaultdict(int))

# replace mystr with open('file.csv', 'r')
with mystr as fin:
    for item in csv.DictReader(fin):
        d[int(item['TaskId'])][int(item['Attr. 1'])] += 1
        d[int(item['TaskId'])][item['Attr. 2']] += 1
        d[int(item['TaskId'])][item['Attr. 3']] += 1

print(d)

defaultdict({123: defaultdict(int, {23: 3, 'twothree': 2, 'xyx': 2,
                                    'four': 1, 'lor': 1}),
             352: defaultdict(int, {34: 1, 'some': 1, 'lkj': 1}),
             456: defaultdict(int, {23: 1, 'four': 1, 'pop': 1})})

然后像普通字典一样迭代：

for k, v in d.items():
    print('TaskId: {0}'.format(k))
    for a, b in v.items():
        print('{0}: {1} times'.format(a, b))

结果：

TaskId: 123
23: 3 times
twothree: 2 times
xyx: 2 times
four: 1 times
lor: 1 times
TaskId: 456
23: 1 times
four: 1 times
pop: 1 times
TaskId: 352
34: 1 times
some: 1 times
lkj: 1 times

相关问题更多 >

编程相关推荐

热门问题

热门文章