<pre><code>In [1]: import pandas as pd
In [2]: !cat data/question1/file.fasta
headerA
AAAGGCCT
headerB
ATCCTTTG
headerC
GGGGTCCCAAT
In [3]: xls=pd.read_excel('file.xls')
In [4]: xls
Out[4]:
Hcolumn Hsequence Kcolumn Ksequence
0 headerA NaN headerB NaN
1 headerC NaN headerE NaN
2 headerD NaN headerF NaN
In [5]: fh = open('file.fasta')
...: fasta_dic={}
...: for line in fh:
...: if line.startswith('h'):
...: seq_header = line.strip('\n')
...: fasta_dic[seq_header] = ''
...: else:
...: fasta_dic[seq_header] = line.strip('\n')
...:
In [6]: def fill_seq(x):
...: if x in fasta_dic.keys():
...: return fasta_dic[x]
...: else:
...: return ''
...:
In [7]: xls['Hsequence'] = xls['Hcolumn'].apply(fill_seq)
...: xls['Ksequence'] = xls['Kcolumn'].apply(fill_seq)
...:
In [8]: xls
Out[8]:
Hcolumn Hsequence Kcolumn Ksequence
0 headerA AAAGGCCT headerB ATCCTTTG
1 headerC GGGGTCCCAAT headerE
2 headerD headerF
</code></pre>
<ol>
<li><p>构建一个字典<code>fasta_dic</code>,序列名作为键,序列作为值。</p></li>
<li><p>函数<code>fill_seq</code>检查输入<code>x</code>是否在您之前定义的字典中,如果找到值,它将返回序列。</p></li>
<li><p>将函数<code>fill_seq</code>应用于H/K序列列,使用H/K序列中的值作为输入。</p></li>
</ol>
<p>在此之后,您可以继续使用数据帧或将其导出到xls文件。在</p>