从python中的文本文件进行搜索和录制

2024-06-16 10:27:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在寻找一些关于以下搜索脚本我有一些建议。任何帮助都会很好。你知道吗

下一行是我的输入(查询)文件(“out.list.txt文件“”)

IVVTGPHKFNRCPLKKLAQSFTMPTSTFVDI*GLNFDITEQHFVKEKP**SSEEAQFFAK

我可以在路线文件中找到这一行和50000其他行(“out.test.txt文件)并打印输出。 这是从路线文件中提取的。你知道吗

Query_13               388   IVVQADGSQVVEDRKADVMNAAYNALQAGLRTIKVGNTNT*VTEVMNKAIEPFECNMLEG  567
c18644_g2_i1_3         122   LVVGASAETPITGNKADVVLAAYNAIQAALRLIKPGNSNLEVTEVFNKIATDYQCNVLEG  181
c18644_g1_i1_2         121   LVVGATAEAPIAGNKADVTLAAYNAIQAALRLIKPGSTNTEVTQVFNKIAADYHCNVLEG  180
c11476_g1_i1_2         119   VVVQ-DPSAKVTGEKADLLLAALNAMQAALRLVRPGNTNTQVTEAMSKIAEAYGCTMLEG  177
c7710_g1_i1_1          147   IVVSEKADAVVEGRKADVVHAAYNALQVALRLLKPGQKNNDVTEHIAKVVESYKCNPVEG  206
c37_g1_i1_3            145   VVVGKDKSTGAEGRKAEVILAAYNALQASLRHLRPGSKNYDVTETVEKISETFGCNPVEG  204
c2897_g1_i1_3          144   FILGATAENPASGKKADVILAAKQAIDAAVRKIRVGETNLTLTETIARVAAAYGVNSVEG  203
c4999_g1_i1_2          167   VVI---GKEKVDDKRADVVKCAWDAAEAALRLVQVGNTNTQVTEAFTKIADEYGCKPMQG  223

如果查询行包含“*”,是否可以在输出的其他行上记录该位置的内容?即E、E、Q、D、D、T、V

到目前为止所有的尝试都没有成功,我想知道我的尝试是否可行。你知道吗

seq_list = open("out.list.txt")

query_sequences = []

for sequence in seq_list:

    query_sequences.append(seq_list.strip())

seq_list.close()

hits = []

alignments = open("out.test.txt")

for line in alignments:

    alignment_hit = line.split()

    for query_sequence in query_sequences:

        if query_sequence in alignment_hit:

            hits.append(line)

            break

alignments.close()

Tags: 文件intxtforlineoutquery路线
2条回答

如果您只需要对齐序列字符,请尝试以下操作(每行还处理多个*

lines = [line.rstrip() for line in open('out.test.txt')]
for line in lines:
    data = line.split()
    sequence = data[2]
    if data[0].startswith("Query"):
        star_indicies = [i for i,c in enumerate(sequence) if c == '*']
    else:
        print(list(sequence[star_index] for star_index in star_indicies))

示例输入的输出

['E']
['E']
['Q']
['D']
['D']
['T']
['Q']
sequence = open("out.list.txt").read() # reads in the file as a string

alignment_rows = open("out.test.txt").readlines() # reads in the file as a list of lines

# split each row by tab sign "\t" and extract sequences only - third column
# I assume, you're using tab sign as a separator in your alignment
alignment_sequences = [ row.split("\t")[2] for row in alignment_rows ]

output = {} # this is a dict, where keys are indices of positions with * and values are lists e.g. {1: ['A', 'C'], 2: ['D', 'E']}
for index, char in enumerate(sequence):
    if char == "*":
        output[index] = []
        for alignment_sequence in alignment sequences:
            output[index].append(alignment_sequence[index])

相关问题 更多 >