擅长:python、mysql、java
<pre><code>original_file = open('NA12878.vcf', 'r')
extracted_file = open('NA12878_SNV.txt', 'w+')
i=0
for line in original_file:
if '##' in line:
extracted_file.write(line)
else:
ref = line.split(' ')[3]
alt = line.split(' ')[4]
if len(ref) == 1 and len(alt) == 1:
extracted_file.write(line)
# Extract SNVs while omitting indels
# Indels will have multiple entries in the REF or ALT column
# The ALT and REF columns appear at position 4 & 5 respectively
original_file.close()
extracted_file.close()
</code></pre>
<p>有两个问题:</p>
<ol>
<li>第二个循环永远不会执行,因为您已经到达了第一个循环中VCF文件的末尾。您可以看到<a href="https://stackoverflow.com/questions/40295650/python-reset-line-in-for-loop">here</a>如何在同一文本文件上重新开始新循环。你知道吗</li>
<li>您没有正确分隔行,因为它是制表符分隔的。你知道吗</li>
</ol>
<p>所以我将代码设置为只使用一个循环执行,并将tab作为split参数。你知道吗</p>