<p>这有帮助吗?它可以在我碰巧安装的python2.7.3和3.2.3中工作。在</p>
<pre><code>import itertools
import sys
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = itertools.tee(iterable)
next(b, None)
if sys.version_info[0] > 2:
return zip(a,b)
return itertools.izip(a, b)
class DnaSequence():
Names = {
'A' : 'adenine',
'C' : 'cytosine',
'G' : 'guanine',
'T' : 'thymine'
}
Bases = Names.keys()
def __init__(self, seq):
self._string = seq
self.bases = { x:0 for x in DnaSequence.Bases }
self.pairs = { x+y:0 for x in DnaSequence.Bases
for y in DnaSequence.Bases }
for base in seq:
if base in self.bases:
self.bases[base] += 1
for x,y in pairwise(seq):
pair = x+y
if pair in self.pairs:
self.pairs[pair] += 1
def printCount(self, base):
if base in DnaSequence.Names:
print(DnaSequence.Names[base].capitalize() +
" base content: " + str(self.bases[base]))
else:
sys.stderr.write('No such base ("%s")\n' % base)
def __repr__(self):
return self._string
d = DnaSequence("CCTAGTGTTAGCTAGTCTAGGGAT")
for base in DnaSequence.Bases:
d.printCount(base)
# Further:
print(d)
print(d.bases)
print(d.pairs)
</code></pre>
<p>这是一个完整的例子,计算基数(a,C,G,T)和所有相邻对的出现(例如,在ACCGTA中,AC、CC、CG、GT、TA都是1,笛卡尔乘积ACGT x ACGT的其他11个可能组合都是0)。在</p>
<p>这里使用的计数方法在构造函数中扫描一次字符串,而不是每次调用<code>getATGC()</code>时扫描它四次。在</p>