在两个不同的文件中查找公共行

2024-09-21 03:21:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在两个不同的文件中找到共同的行,并试图在一个新的文本文件中列出它们。我在下面写了这个,但是它没有找到commons,只写我在arg2中给出的任何文件。请帮我排除故障

#!/usr/bin/python

import sys


def find_common_lines(arg1, arg2, arg3):
    fh1 = open(arg1, 'r+')
    fh2 = open(arg2, 'r+')
    with open(arg3, 'w+') as f:
        for line in fh1 and fh2:
            if line:
                f.write(line)

    fh1.close()
    fh2.close()


number_of_arguments = len(sys.argv) - 1
if number_of_arguments < 3:
    print("ERROR:\tThe script is called with less than 3 arguments, but it needs 3!")
    print("Usage:\tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
    arg1 = sys.argv[1]
    arg2 = sys.argv[2]
    arg3 = sys.argv[3]
    find_common_lines(arg1, arg2, arg3)

所以,基本上我想要这个脚本做的是:

文件A

AAB
BBC
DDE
GGC

文件B

123
AAB
DDE
345
GHY
GJK

文件C

AAB
DDE

谢谢


Tags: 文件syslinecommonopenargumentsarg3lines
3条回答

尝试使用字典:

import sys
def find_common_lines(arg1, arg2, arg3):
    alllines_dict = {}
    with open(arg1, 'r') as f:
        while True:
            line = f.readline()
            if not line:
                break
            alllines_dict[line.strip()] = 1
    with open(arg3, 'w') as out:
        with open(arg2, 'r') as f:
            while True:
                line2 = f.readline()
                if not line2:
                    break
                line2 = line2.strip()
                ispresent = alllines_dict.get(line2, None)
                if ispresent is not None:
                    out.write(line2 + '\n')
number_of_arguments = len(sys.argv)-1
print(sys.argv)
if number_of_arguments < 3:
    print("ERROR:\tThe script is called with less than 3 arguments, but it needs 3!")
    print("Usage:\tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
    arg1 = sys.argv[1]
    arg2 = sys.argv[2]
    arg3 = sys.argv[3]
    find_common_lines(arg1, arg2, arg3)

您可以使用python的库pandas来实现:

为每个.txt文件创建数据帧,如下所示:

In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)

In [2018]: df_A
Out[2018]: 
     0
0  AAB
1  BBC
2  DDE
3  GGC

In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)

In [2020]: df_B
Out[2020]: 
     0
0  123
1  AAB
2  DDE
3  345
4  GHY
5  GJK

现在,merge两个数据帧(如内部连接)只找出两者之间的公共行

In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
Out[2021]: df_C
     0
0  AAB
1  DDE

然后,您可以将此输出写入如下文件:

In [2023]: df_C.to_csv('out.csv', index=False)

这将是有效的,因为不需要循环,也不需要编写复杂的正则表达式。代码变得更干净、更简单

如果这有帮助,请告诉我

首先,在使用“and”运算符时需要给出2条逻辑语句,现在使用1条逻辑语句,然后在for循环中直接输入fh2。尝试将代码更改为以下内容:

for line in fh1 and fh2:
    if line:
        f.write(line)

if line in fh1:
    if line in fh2:
        f.write(line)

相关问题 更多 >

    热门问题