我正在尝试编写一个函数来读取FASTA文件并生成所有DNA序列的列表,它们的组合分子量在两个选定的极限之间。我用随机序列生成了这个文件,找到了一个函数,它给出了分子量在给定极限之间的序列列表:
def generate_sequence2(min_len, max_len, alphabet):
length = random.randint(min_len, max_len)
return Seq(''.join(random.choices(alphabet.letters, k=length)), alphabet=alphabet)
def generate_seq_records2(nr_seqs, min_len, max_len, alphabet):
records = list()
for i in range(1, nr_seqs + 1):
seq_id = f'seq_{i:03d}'
seq = generate_sequence2(min_len, max_len, alphabet)
records.append(SeqRecord(seq, id=seq_id, description=f'fragment {i}'))
return records
records2=generate_seq_records2(20, 3, 8, unambiguous_dna)
Bio.SeqIO.write(records2, 'dna_fragments.fasta', 'fasta')
def fasta_file(file_name, min_weight, max_weight):
with open(file_name) as file:
sum_weight=0
end_list=[]
for line in file:
if line.startswith('>'):
continue
line = line.rstrip('\n')
weight = molecular_weight(line)
sum_weight=sum_weight+weight
end_list.append(line)
if min_weight<sum_weight<max_weight:
print (f'{end_list} | {sum_weight:10.3f}')
fasta_file('dna_fragments.fasta',5000,8000)
现在这给了我“按时间顺序”的列表。它从序列1开始,然后加上序列2,然后加上序列3,以此类推,直到达到上限,如下所示:
['TAGG', 'ATGTC', 'AAA', 'CTTTG'] | 5358.440
['TAGG', 'ATGTC', 'AAA', 'CTTTG', 'CTCC'] | 6548.194
['TAGG', 'ATGTC', 'AAA', 'CTTTG', 'CTCC', 'ATG'] | 7512.815
但是我想有所有的可能的列表,以及“随机”列表,例如sequence1+sequence3+sequence4+sequence6
有人知道怎么做吗
提前谢谢强>
目前没有回答
相关问题 更多 >
编程相关推荐