<p><strong>通过字符串处理:</strong></p>
<ol>
<li>通过<code>for</code>循环迭代内容中的每一行</李>
<li>找到“sp |”世界。并设置起始索引</李>
<li>找到“.”字符和“|”字符并比较两者的索引</李>
<li>从步骤3中获取结束索引</li>
<li>为结果添加值</李>
</ol>
<p>演示:</p>
<pre><code>content = """gi|1168222|sp|P46098.1|5HT3A_HUMAN
gi|1168223|sp|P35563.2|5HT3A_RAT
gi|112809|sp|P23979.1|5HT3A_MOUSE
gi|24211440|sp|O70212.1|5HT3A_CAVPO
gi|113067|sp|P22770|ACHA7_CHICK"""
result = []
for line in content.split("\n"):
start_index = line.find("sp|")
if start_index==-1:
continue
#- +3 because lenght of sp| is 3
end_index1 = line.find(".", start_index+3)
end_index2 = line.find("|", start_index+3)
if end_index1==-1 and end_index2==-1:
continue
elif end_index2==-1:
end_index = end_index1
elif end_index1==-1:
end_index = end_index2
elif end_index1 < end_index2:
end_index = end_index1
else:
end_index = end_index2
result.append(line[start_index+3:end_index])
print result
</code></pre>
<p>输出:</p>
<pre><code>['P46098', 'P35563', 'P23979', 'O70212', 'P22770']
</code></pre>
<hr/>
<p>通过<strong>CSV</strong></p>
<ol>
<li>由于输入结构良好,所以使用CSV模块</李>
<li>通过CSV模块读取输入文件</李>
<li>使用列表理解和拆分方法得到最终结果</李>
</ol>
<p>演示:</p>
<pre><code>import csv
input_file = "dp-input1.csv"
with open(input_file) as fp:
root = csv.reader(fp, delimiter='|')
result = [row[3].split(".")[0] for row in root]
#for row in root:
# tmp = row[3].split(".")[0]
# result.append(tmp)
print "Final result:-", result
</code></pre>
<p>输出:</p>
<pre><code>Final result:- ['P46098', 'P35563', 'P23979', 'O70212', 'P22770']
</code></pre>