<p>我认为这样的方法可以做到(假设输出文件是制表符分隔的):</p>
<pre><code>import csv
import os
receptors = ['crystal_1', 'modeller_1', 'moe_1',
'nci5_modeller0000_1', 'nci5_modeller0001_1',
'nci5_modeller0002_1', 'nci5_modeller0003_1',
'nci5_modeller0004_1', 'nci5_modeller0005_1',
'nci5_modeller0006_1', 'nci5_modeller0007_1',
'nci5_modeller0008_1', 'nci5_modeller0009_1',
'nci5_modeller0010_1', 'nci5_modeller0011_1',
'nci5_moe0000_1', 'nci5_moe0001_1', 'nci5_moe0002_1',
'nci5_moe0003_1', 'nci5_moe0004_1', 'nci5_moe0005_1',
'nci5_moe0006_1', 'nci5_moe0007_1', 'nci5_moe0008_1',
'nci5_moe0009_1', 'nci5_moe0010_1', 'nci5_moe0011_1',
'nci5_moe0012_1', 'nci5_moe0013_1', 'nci5_moe0014_1']
with open('potentiation.txt', 'rt') as experiment, \
open('output.csv', 'wb') as outfile:
csv_writer = csv.writer(outfile, delimiter='\t')
csv_writer.writerow(['Ligand'] + receptors) # header row
for ligand in (line.rstrip() for line in experiment):
row = [ligand]
for protein in receptors:
with open(protein+'.txt', "rt") as file1:
found = ['Found', 'Not Found'][file1.read().find(ligand) == -1]
row.append(found)
csv_writer.writerow(row)
print('output.csv file written')
</code></pre>
<p><strong>更新</strong></p>
<p>正如我在一篇评论中所说,只需读一次蛋白质文件,就可以快得多。为了能够做到这一点并以您想要的方式格式化输出,每个文件中每个配体的检查结果都需要存储在一个数据结构中,这个数据结构是在每个文件被读取和检查多次时逐步建立起来的,结果在所有操作完成之后,一次都被写出来。一份简单的清单清单就足以满足这一目的,并已在下文的实施中使用。在</p>
<p>取而代之的是使用更多的内存,而不是一遍又一遍地阅读蛋白质文件。由于磁盘IO通常是计算机上速度最慢的东西之一,所以只需稍微增加一点代码复杂度,就可以获得巨大的性能提升。在</p>
<p>下面是显示此替代版本的代码:</p>
^{pr2}$