如何确认一个文件中的某些子字符串是否包含在另一个文件中?

2024-06-28 14:35:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我有2个文件,如下所示,并尝试查找file2的哪个子字符串位于file1中:

file1.txt:

NP_001106283
MRIISRQIVLLFSGFWGLAMGAFPSSVQIGGLFIRNTDQEYTAFRLAIFLHNTSP
NP_001106697
MYLSRFLSIHALWVTVSSVMQPYPLVWGHYDLCKTQIYTEEGKVWD

file2.txt

RIISRQIVLL
AABBCCDD
SRFLSIHAL
BBBBCCEE

预期结果:

RIISRQIVLL
SRFLSIHAL

我尝试过但不起作用的代码:

with open("file1.txt", mode="r") as file1, open("file2.txt", mode="r") as file2:
    data=file1.read()
    for line in file2:
        if line in data:
            print(line)        

有什么建议或帮助吗? 谢谢


Tags: 文件字符串intxtdatamodeasnp
3条回答

这样试试

with open('file1.txt') as f1, open('file2.txt') as f2:
    lines_f1 = '-'.join(f1.read().split())
    lines_f2 = f2.read().split()
    for line in lines_f2:
        if line in lines_f1:
            print(line)

“-”用作分隔符,因此如果要搜索的任何字符串中包含“-”,则可以使用任何其他分隔符

只需添加line.strip(),您的代码就可以工作了

with open("xyz.txt", 'r') as file1, open("second.txt", 'r') as file2, open('output.txt', 'w') as output:
    data=file1.read()
    for line in file2:
        if line.strip() in data:            
            output.write(line)

我试过了

xyz.txt

NP_001106283
MRIISRQIVLLFSGFWGLAMGAFPSSVQIGGLFIRNTDQEYTAFRLAIFLHNTSP
NP_001106697
MYLSRFLSIHALWVTVSSVMQPYPLVWGHYDLCKTQIYTEEGKVWD

second.txt

RIISRQIVLL
AABBCCDD
SRFLSIHAL
BBBBCCEE

输出:

RIISRQIVLL
SRFLSIHAL

您的数据结尾可能包含\n

with open("file1.txt", mode="r") as file1, open("file2.txt", mode="r") as file2:
    data=file1.read()

    for line in file2:

        if line.replace("\n", "") in data:
            print(line)  

您可以尝试通过以下命令打印行以检查实际文本

print(repr(line))

相关问题 更多 >