<p>Numpy有一个<code>char.replace</code>方法(参见<a href="https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.core.defchararray.replace.html#numpy.core.defchararray.replace" rel="nofollow noreferrer">docs</a>)。你需要做的就是:</p>
<pre><code>genes = np.char.replace(genes, 'A', '1')
genes = np.char.replace(genes, 'C', '2')
genes = np.char.replace(genes, 'G', '4')
genes = np.char.replace(genes, 'T', '8')
</code></pre>
<p>要将其转换为<code>int</code>数组</p>
^{pr2}$
<p>然后可以在数组上使用<a href="https://wiki.python.org/moin/BitwiseOperators" rel="nofollow noreferrer">bitwise operations</a>。在</p>
<hr/>
<p>正如评论中所指出的,结果序列的长度是有限的。解决这个问题的方法:</p>
<pre><code>genes = np.char.replace(genes, 'A', '1')
genes = np.char.replace(genes, 'C', '2')
genes = np.char.replace(genes, 'G', '4')
genes = np.char.replace(genes, 'T', '8')
>>> genes
array([['12481248'],
['12481248']], dtype='|S8')
</code></pre>
<p>在数字之间插入逗号</p>
<pre><code>genes = np.char.join(',', genes)
>>> genes
array([['1,2,4,8,1,2,4,8'],
['1,2,4,8,1,2,4,8']], dtype='|S15')
</code></pre>
<p>基于逗号拆分并转换回纯<code>np.char.array</code></p>
<pre><code>genes = np.char.array(np.char.split(genes, ','))
>>> genes
chararray([[['1', '2', '4', '8', '1', '2', '4', '8']],
[['1', '2', '4', '8', '1', '2', '4', '8']]], dtype='|S1')
</code></pre>
<p>转换为<code>int</code>数组:</p>
<pre><code>genes = np.array(genes, dtype=int)
>>> genes
array([[[1, 2, 4, 8, 1, 2, 4, 8]],
[[1, 2, 4, 8, 1, 2, 4, 8]]])
</code></pre>
<p>删除大小为<code>1</code>的中间维度:</p>
<pre><code>genes = genes.reshape(list(genes.shape[:-2]) + [genes.shape[-1]])
>>> genes
array([[1, 2, 4, 8, 1, 2, 4, 8],
[1, 2, 4, 8, 1, 2, 4, 8]])
</code></pre>