如何从CSV文件中每行值数目可变的列中提取数据?

2024-09-24 22:24:06 发布

您现在位置:Python中文网/ 问答频道 /正文

基本上,我尝试将csv文件中同名项列的count列值相加。然后我需要按项目列值按字母升序对结果进行排序。例如:

Leading Cause, Deaths
Diabetes Mellitus, 123
Influenza and Pneumonia, 325
Diabetes Mellitus, 100

我需要将值123和100相加,得到糖尿病的新行。你知道吗

应该是这样的:
Diabetes Mellitus, 223。你知道吗

这是我目前掌握的代码:

import csv
import sys

with open(sys.argv[1], 'r') as File:
    reader = csv.reader(File)
    itemindex = sys.argv[2]
    countindex = sys.argv[3]
    item column = 0
    count column = 0
    first row = True
    dictionary = {}

    for row in reader:
       if firstrow == True:
          firstrow = False
          itemcolumn = row.index(itemindex)
          countcolumn = row.index(countindex)
       else:
           if item column in dictionary:
               # Add the item at the row's count column (converted to an int) to the
               # prexisting entry with that item column.
           else:
               #create a new entry in the dictionary

       print(itemindex + "," + countindex)

for key, value in sorted(dictionary)
    print(key + "," + value)

评论的部分就是我一直坚持的部分。你知道吗


Tags: csvtheindictionarycountsyscolumnitem
3条回答

使用^{},这是一个专门的字典类,可以使它变得简单:

import collections
import csv
import os
import sys

try:
    filename = sys.argv[1]
    itemindex = int(sys.argv[2])
    countindex = int(sys.argv[3])
except IndexError:
    print('Error:\n  Usage: {} <file name> <item index> <count index>'.format(
            os.path.basename(sys.argv[0])))
    sys.exit(-1)

with open(filename, 'r', newline='') as file:
    reader = csv.reader(file, skipinitialspace=True)
    next(reader)  # Skip first row.

    counter = collections.defaultdict(int)
    for row in reader:
        disease, deaths = row[itemindex], int(row[countindex])
        counter[disease] += deaths

for key, value in sorted(counter.items()):
    print('{}, {}'.format(key, value))

示例用法:

python3 script_name.py diseases.csv 0 1    

样本输出:

Diabetes Mellitus, 223
Influenza and Pneumonia, 325

您可以尝试不使用提供的库,而只是将数据作为文本进行循环。分析文本。你知道吗

如果您有访问权限,可以使用pandas包来处理csv。你知道吗

标题为的文本文件值.txt你知道吗

Leading Cause, Deaths
Diabetes Mellitus, 123
Influenza and Pneumonia, 325
Diabetes Mellitus, 1008

所需的数据帧可以通过以下方式实现:

import pandas as pd

data = pd.read_csv('values.txt')
print(data)

sum_data = data.groupby(['Leading Cause']).sum()
print(sum_data)

print(sum_data.loc['Diabetes Mellitus'])

哪个会输出

             Leading Cause   Deaths
0        Diabetes Mellitus      123
1  Influenza and Pneumonia      325
2        Diabetes Mellitus     1008

                          Deaths
Leading Cause                   
Diabetes Mellitus           1131
Influenza and Pneumonia      325

 Deaths    1131
Name: Diabetes Mellitus, dtype: int64

相关问题 更多 >