使用Python对csv文件中的列进行排序并对行排序

strpath = 'C://Users//User//Desktop//compare//' strFileNameA = 'File1' strFileNameB = 'File2' testfile1 = open(strpath + strFileNameA + '.csv', 'r') testfile2 = open(strpath + strFileNameB + '.csv', 'r') testresult1 = open(strpath + strFileNameA + '-Results' + '.csv', 'w') testresult2 = open(strpath + strFileNameB + '-Results' + '.csv', 'w') testlist1 = testfile1.readlines() testlist2 = testfile2.readlines() k=1 z=0 for i,j in zip(testlist1,testlist2): if k==1: testresult1.write(i.rstrip('\n') + ('n')) if i!=j: testresult1.write(i.rstrip('\n') + ('n')) testresult2.write(j.rstrip('\n') + ('n')) z = z+1 k =int(k) k = k+1 if z ==0: testresult1.write('Exact match for ' + str(k) + ' rows') testresult1.write('Exact match for ' + str(k) + ' rows') testfile1.close() testfile2.close() testresult1.close() testresult2.close()

2条回答

网友

1楼 · 编辑于 2024-09-28 03:24:51

这是向您介绍Python编程的一个很好的练习。有许多字符串函数可以使许多数据处理任务更简单。您可以在文档中查看更多的字符串函数https://docs.python.org/3/library/string.html。在

首先，我建议使用os.path.join操作系统（）以创建路径字符串。其次，我建议使用内置方法sorted（）对文件的行进行排序。请注意，排序时必须小心，因为排序字符串与排序整数不同。在

步骤1使用内置的sorted函数按列1对每一行进行排序。这是通过传递lambda函数作为关键参数来实现的。因为Python使用从零开始的索引，所以引用x[0]使用第一列。所以这个特殊的lambda函数只返回每行的第一列。在

步骤2遍历每个文件的所有行。如果它们都匹配，那么它们就会配对在一起。否则，一行与空行匹配。在

import os

strpath = '.'
strFileNameA = 'file1'
strFileNameB = 'file2'

testfile1 = open(os.path.join(strpath, '%s.csv'%(strFileNameA)), 'r')
testfile2 = open(os.path.join(strpath, '%s.csv'%(strFileNameB)), 'r')

testlist1 = testfile1.readlines()
testlist1 = [eachLine.rstrip("\n").split(",") for eachLine in testlist1]
testlist2 = testfile2.readlines()
testlist2 = [eachLine.rstrip("\n").split(",") for eachLine in testlist2]

#step 1
testlist1 = sorted(testlist1,key=lambda x: x[0])
testlist2 = sorted(testlist2,key=lambda x: x[0])

#step 2
def look_for_match(i,list1,j,list2):
    if i == len(list1):
        return i,j+1, ([],list2[j])
    elif j == len(list2):
        return i+1,j,(list1[i],[])
    elif list1[i][0] == list2[j][0]:
        return i+1, j+1,(list1[i],list2[j])
    elif list1[i][0] < list2[j][0]:
        return i+1,j,(list1[i],[])
    else:
        return i,j+1, ([],list2[j])

matched_rows = []
i=0
j=0
while i<len(testlist1) or j<len(testlist2):
    i, j, matched_row = look_for_match(i,testlist1,j,testlist2)
    if matched_row[0] == [] or matched_row[1] == []:
        matched_rows.append(matched_row)


for row_file_1, row_file_2 in matched_rows:
    print(row_file_1, row_file_2)

for row_file_1, row_file_2 in matched_rows:
    print(row_file_1, row_file_2)

网友

2楼 · 编辑于 2024-09-28 03:24:51

我建议要么查看namedtuple:https://docs.python.org/3/library/collections.html#collections.namedtuple

或sqlite： https://docs.python.org/3/library/sqlite3.html#module-sqlite3

这两个版本都在3.4.1中提供。在

如果这些不合适（即它们是相对较小的模型点文件），可以使用内置的set对象来比较两组数据，并使用set操作来过滤：

with open('csv1.csv','r') as csv_file1:
    header1 = next(csv_file1)   #skip header
    set1 = set(line for line in csv_file1)

with open('csv2.csv','r') as csv_file2:
    header2 = next(csv_file2)   #skip header
    set2 = set(line for line in csv_file2)

print((set1 - set2) |(set2 - set1))

一旦你完成了这个集合，你就可以把它转换成一个列表，对它进行排序，然后写出来。在

相关问题更多 >

编程相关推荐

热门问题

热门文章