如何将多个字符串放入for循环的列表中?

2024-10-01 04:55:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用for循环搜索NCBI蛋白质数据库中的蛋白质id列表,并尝试将这些id转换为描述。举个例子:

import pandas as pd
from Bio import Entrez
from Bio import SeqIO

df2=pd.read_csv('ID.txt', header=None)
df.columns = ['protein_ID']  # put a header 'protein_ID' to the dataframe
lists=df.protein_ID.tolist() #convert the column into a list of protein IDs.

description = ''
for num, line in enumerate(lists):
    handle = Entrez.efetch(db="protein", id=line, rettype="gb", retmode="text")
    record = SeqIO.read(handle, "genbank")
    description += record.description

description

它返回一个巨大的字符串:

'hypothetical protein UR61_C0009G0014 [candidate division WS6 bacterium GW2011_GWE1_34_7]ATPase [candidate division WS6 bacterium GW2011_GWE2_33_157]hypothetical protein UR96_C0034G0007 [candidate division WS6 bacterium GW2011_GWC1_36_11]phosphoenolpyruvate synthase [Candidatus Komeilibacteria bacterium RIFOXYC1_FULL_37_11]'

我想要的是一个包含换行符的字符串列表,如下所示:

[
'hypothetical protein UR61_C0009G0014 [candidate division WS6 bacterium GW2011_GWE1_34_7]',
'ATPase [candidate division WS6 bacterium GW2011_GWE2_33_157]',
'hypothetical protein UR96_C0034G0007 [candidate division WS6 bacterium GW2011_GWC1_36_11]',
'phosphoenolpyruvate synthase [Candidatus Komeilibacteria bacterium RIFOXYC1_FULL_37_11]'
]

如何做到这一点?非常感谢


Tags: importid列表for蛋白质descriptioncandidatedivision
1条回答
网友
1楼 · 发布于 2024-10-01 04:55:45

What I want is a list of strings

description = []
for num, line in enumerate(lists):
    ....
    description.append(record.description)

with new line breaks

默认情况下,列表不会以这种方式打印,请使用pprint

import pprint

# you original code here

pprint.pprint(description)

相关问题 更多 >