查找文件之间的公用列表

fileA = open("A.txt",'r') fileB = open("B.txt",'r') fileC = open("C.txt",'r') listA1 = [] for line1 in fileA: listA = line1.split('\t') listA1.append(listA) listB1 = [] for line1 in fileB: listB = line1.split('\t') listB1.append(listB) listC1 = [] for line1 in fileC: listC = line1.split('\t') listC1.append(listC) for key1 in listA1: for key2 in listB1: for key3 in listC1: if key1[1] == key2[1] and key2[1] == key3[1] and key3[1] == key1[1]: print "Common between three files:",key1[1] print "Common between file1 and file2 files:" for key1 in listA1: for key2 in listB1: if key1[1] == key2[1]: print key1[1] print "Common between file1 and file3 files:" for key1 in listA1: for key2 in listC1: if key1[1] == key2[1]: print key1[1]

1条回答

网友

1楼 · 发布于 2024-09-28 17:28:41

如果只想按第二列对A1、B1、和{}进行排序，这很简单：

listA1.sort(key=operator.itemgetter(1))

如果你不明白itemgetter，这是相同的：

^{pr2}$

但是，我认为更好的解决方案是使用set：

setA1 = set(element[1] for element in listA1)
setB1 = set(element[1] for element in listB1)
setC1 = set(element[1] for element in listC1)

或者，更简单地说，不要首先构建列表；请执行以下操作：

setA1 = set()
for line1 in fileA:
    listA = line1.split('\t')
    setA1.add(listA[1])

任何一种方式：

print "Common between file1 and file2 files:"
for key in setA1 & setA2:
    print key

为了进一步简化，您可能需要首先将重复的内容重构为函数：

def read_file(path):
    with open(path) as f:
        result = set()
        for line in f:
            columns = line.split('\t')
            result.add(columns[1])
    return result

setA1 = read_file('A.txt')
setB1 = read_file('B.txt')
setC1 = read_file('C.txt')

然后你就能找到更多的机会。例如：

def read_file(path):
    with open(path) as f:
        return set(row[1] for row in csv.reader(f))

正如John Clements所指出的，你甚至不需要所有三个都是集合，只要A1，所以你可以这样做：

^{8}$

您需要做的唯一其他更改是必须调用intersection，而不是使用&运算符，因此：

for key in setA1.intersection(iterB1):

我不确定最后的改变是否真的是一种进步。但是在Python3.3中，您只需要将return set(…)改为yield from (…)，我可能会用这种方式来实现。（即使文件非常大并且有大量的重复项，因此会有性能上的损失，我只需要在itertools调用周围粘贴itertools中的unique_everseen。）

相关问题更多 >

编程相关推荐

热门问题

热门文章