计算CSV文件中特定列中的重复值，并将值返回到另一列（python2）

case_id_list_data = [] with open(file_path_1, "rU") as g: for line in g: case_id_list_data.append(line.split('\t')) #print case_id_list_data[0][0] #the result is dissatisfying #I'm stuck here..

3条回答

网友

1楼 · 编辑于 2024-09-29 21:34:32

import pandas as pd
#read your csv to a dataframe
df = pd.read_csv('file_path_1')
#generate the Total_GeneralID by counting the values in the GeneralID column and extract the occurrance for the current row.
df['Total_GeneralID'] = df.GeneralID.apply(lambda x: df.GeneralID.value_counts()[x])
df = df[['KeyID','Total_GeneralID']]
Out[442]: 
    KeyID  Total_GeneralID
0  145258                2
1  145259                1
2  145260                1
3  145261                2

网友

2楼 · 编辑于 2024-09-29 21:34:32

您必须将任务分为三个步骤： 1读取CSV文件 2生成新列的值三。为文件添加价值导入csv 导入文件输入导入系统

# 1. Read CSV file
# This is opening CSV and reading value from it.
with open("dev.csv") as filein:
    reader = csv.reader(filein, skipinitialspace = True)
    xs, ys = zip(*reader)

result=["Total_GeneralID"]

# 2. Generate new column's value
# This loop is for counting the "GeneralID" element.
for i in range(1,len(ys),1):
    result.append(ys.count(ys[i]))

# 3. Add value to the file back
# This loop is for writing new column
for ind,line in enumerate(fileinput.input("dev.csv",inplace=True)):
    sys.stdout.write("{} {}, {}\n".format("",line.rstrip(),result[ind]))

我没有使用临时文件或任何高级模块，如熊猫或任何东西。在

网友

3楼 · 编辑于 2024-09-29 21:34:32

如果你不喜欢熊猫，想留在标准图书馆：

代码：

import csv
from collections import Counter
with open('file1', 'rU') as f:
    reader = csv.reader(f, delimiter='\t')
    header = next(reader)
    lines = [line for line in reader]
    counts = Counter([l[1] for l in lines])

new_lines = [l + [str(counts[l[1]])] for l in lines]
with open('file2', 'wb') as f:
    writer = csv.writer(f, delimiter='\t')
    writer.writerow(header + ['Total_GeneralID'])
    writer.writerows(new_lines)

结果：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

计算CSV文件中特定列中的重复值，并将值返回到另一列（python2）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >