使用Python通过比较两个CSV文件将空格替换为NULL值

2024-09-28 17:25:11 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有两个csv文件:

csvfile1

name      Dept  City      
sree,     CSE,  Bengaluru,  
vatsasa,  ECE,  Hyd,      
          IT,   VJA,      
capini,   Mech, TPTY,   
DTP,      Civil,kandra
Bengaluru,ECM,  TVM,      
sre,      ECS,  MNGL,   
vatsas,         Kochi,    
          Nano, TVM,      
capmin,         Tech,       
DTP9,     CSS,  Kochi,    
          ESS,  TVM,    
sree0,    RSS,  MNGL,   

csvfile2

name, Dept, City, Address

我想检查csvfile2中存在的csvfile1的所有列。你知道吗

  1. 如果存在,请检查csvfile1中的任何一列是否包含空格。如果存在空格,则用值NULL替换空格,并按列将所有列写入新的csv文件csvfile3

  2. 如果不存在,则将缺少的列与现有列一起写入csvfile3。此外,那些缺少的列的值应在csvfile3中显示为NULL,并且现有列下的空格应替换为NULL

预期输出如下:

name      Dept  City      Address
sree,     CSE,  Bengaluru,NULL
vatsasa,  ECE,  Hyd,      NULL
NULL,     IT,   VJA,      NULL
capini,   NULL, Mech,     NULL
DTP,      Civil,NULL,     NULL
Bengaluru,ECM,  TVM,      NULL
sre,      ECS,  MNGL,     NULL
vatsas,   NULL, Kochi,    NULL
NULL,     Nano, TVM,      NULL
capmin,   NULL, Tech,     NULL
DTP9,     CSS,  Kochi,    NULL
NULL,     ESS,  TVM,      NULL
sree0,    RSS,  MNGL,     NULL

我编写了以下代码:

f=open('csvfile2.csv', 'r')
g=csv.reader(f)
first=next(g, None)
print('lenght of first list', len(first))
f1=open('csvfile1.csv','r')
h=csv.reader(f1)
second=next(h,None)
print('lenght of first list', len(second))
f2=open('csvfile3', 'w')
writer=csv.writer(f2)
count=0
if len(second) < len(first):
    for i in first:
        if not i in second:
            for count in range:
                writer.writerows('Null')
                print('null')
        else:
            ind=second.index(i)
                for j in second:
                    if not j[ind]:
                        writer.writerows(j[ind].replace(' ','Null'))                            
                    else:
                        writer.writerows(j[ind])

以上代码输出:

name, Dept, City, Address
N
U
L
L
N
U
L
L
N
U
L
L
N
U
L
L
N
U
L
L

我已经在awsec2实例上编写了代码,并计划在awslambda上也使用同样的代码。你知道吗


Tags: csvnamecitynullwriterfirst空格second
2条回答

直接操作csv文件可能非常困难。我建议使用pandas来处理表格式的数据结构,因为它非常高效,并且最小化了代码。你知道吗

示例代码:

import pandas as pd 

# READ BOTH FILES AS TABULAR DATA STRUCTURE
# PROVIDE FILE PATH
csv_file_one = pd.read_csv("csv_file_one.csv")
csv_file_two = pd.read_csv("csv_file_two.csv")

# REPLACE EMPTY VALUES WITH NULL IN CSV ONE
csv_file_one.fillna(value='NULL', inplace=True)

header_of_csv_two = list(csv_file_two.columns.values)

# IF CSV FILE ONE DOESN'T HAS COLUMN
# OF CSV TWO, THEN IT WILL CREATE THAT
# WITH NULL VALUES
for each_col in header_of_csv_two:
    if each_col not in csv_file_one.columns:
        csv_file_one[each_col] = 'NULL'

# WRITING TO CSV
# PROVIDE FILE PATH
csv_file_one.to_csv("csv_file_three.csv", index=False)

样本输出:

name      dept       city  address
ram        NULL  kathmandu    NULL
kiran  computer       NULL    NULL
kumar     civil      patan    NULL

如果您不想使用pandas,这里有一个使用csv的解决方案:

import csv

with open("csvfile1.csv") as csv_1,\
     open("csvfile2.csv") as csv_2,\
     open("csvfile3.csv", "w") as csv_3:

    reader_1 = csv.reader(csv_1)
    reader_2 = csv.reader(csv_2)
    writer = csv.writer(csv_3)

    headers_1 = next(reader_1)
    headers_2 = next(reader_2)

    insert_null_at = []
    for i, header in enumerate(headers_2):
        if header not in headers_1:
            insert_null_at.append(i)

    writer.writerow(headers_2)
    for row in reader_1:
        for i in insert_null_at:
            row.insert(i, "")

        writer.writerow([item if item != "" else "NULL" for item in row])

如上所述,这假设csvfile1的格式实际上是:

name,Dept,City
sree,CSE,Bengaluru
vatsasa,ECE,Hyd
,IT,VJA
capini,Mech,TPTY
DTP,Civil,kandra
Bengaluru,ECM,TVM
sre,ECS,MNGL
vatsas,,Kochi
,Nano,TVM
capmin,,Tech
DTP9,CSS,Kochi
,ESS,TVM
sree0,RSS,MNGL

相关问题 更多 >