逐行比较文件python

2024-10-01 11:23:39 发布

您现在位置:Python中文网/ 问答频道 /正文

通过第一个索引浏览已排序列表的最优雅的方式是什么?输入:

Meni22   xxxx xxxx
Meni32_2 xxxx xxxx
Meni32_2 xxxx xxxx
Meni45_1 xxxx xxxx
Meni45_1 xxxx xxxx
Meni45   xxxx xxxx

是不是要一行一行地走下去:

^{pr2}$

这个例子显然行不通。它添加第[0]行的第一个匹配项并继续。我宁愿让它遍历列表,将它只找到一次的行添加到列表1中,其余的添加到列表2中。在

脚本后:

List1:

Meni22   xxxx xxxx
Meni45   xxxx xxxx

List2: 

Meni45_1 xxxx xxxx
Meni45_1 xxxx xxxx
Meni32_2 xxxx xxxx
Meni32_2 xxxx xxxx

Tags: 脚本列表排序方式例子list2xxxxlist1
3条回答

由于文件已排序,因此可以使用groupby

from itertools import groupby
list1, list2 = res = [], []
with open('file1.txt', 'rb') as fin:
    for k,g in groupby(fin, key=lambda x:x.partition(' ')[0]):
        g = list(g)
        res[len(g) > 1] += g

或者你更喜欢这个长一点的版本

^{pr2}$

考虑使用difflib

import difflib

d = difflib.Differ()
fa = open('a.txt'); fb = open('b.txt')

diff = d.compare("".join(fa.readlines()), "".join(fb.readlines()))
print ''.join(list(diff))

fa.close(); fb.close()

您可以使用collections.Counter

from collections import Counter
lis1 = []
lis2 = []
with open("abc") as f:
    c = Counter(line.split()[0] for line in f)

for key,val in c.items():
    if val == 1:
        lis1.append(key)
    else:
        lis2.extend([key]*val)
print lis1
print lis2

输出:

^{pr2}$

编辑:

from collections import defaultdict
lis1 = []
lis2 = []

with open("abc") as f:
    dic = defaultdict(list)
    for line in f:
        spl =line.split()
        dic[spl[0]].append(spl[1:])

for key,val in dic.items():
    if len(val) == 1:
        lis1.append(key)
    else:
        lis2.append(key)
print lis1
print lis2

print dic["Meni32_2"]  #access columns related to any key from the the dict

输出:

['Meni45', 'Meni22']
['Meni32_2', 'Meni45_1']
[['xxxx', 'xxxx'], ['xxxx', 'xxxx']]

相关问题 更多 >