擅长:python、mysql、java
<p>Adirmola给出的答案很好,但是您可以通过应用一些修改来提高代码质量:</p>
<pre><code># Use "with" context managers to open files.
# The closing will be automatic, even in case of problems.
with open("NA12878.vcf", "r") as vcf_file, \
open("NA12878_SNV.txt", "w") as snv_file:
for line in vcf_file:
# Since you have specific knowledge of the format
# you can be more specific when identifying header lines
if line[:2] == "##":
snv_file.write(line)
else:
# You can avoid doing the splitting twice
# with list unpacking (using _ for ignored fields)
# https://www.python.org/dev/peps/pep-3132/
[_, _, _, ref, alt, *_] = line.split("\t") # "\t" represents a tab
if len(ref) == 1 and len(alt) == 1:
snv_file.write(line)
</code></pre>
<p>我在你的文件上用python3.6测试了这个,最终得到了554个snv。
这里使用的一些语法(特别是对于列表解包)可能不适用于较旧的python版本。你知道吗</p>