将字符串的numpy转换成numpython字符问题的回答

将字符串的numpy转换成numpython字符

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

下面是一个使用查找表的方法： <pre><code>>>> alphabet = np.array(list('ACGT')) >>> alphabet array(['A', 'C', 'G', 'T'], dtype='<U1') </code></pre> 要使用查找表，我们需要将字母重新解释为索引，这是通过视图转换完成的： ^{pr2}$ 我们现在可以构建它需要的<code>85</code>槽，实际上我们只使用4个插槽，即<code>65</code>，<code>67</code>，<code>71</code>和{<cd5>}。至于输出格式，我们可以自由选择最符合我们要求的格式： 示例一-输出为bytestring： <pre><code>>>> lookup_1 = np.zeros((alph_as_num.max()+1), dtype='S4') >>> lookup_1[alph_as_num] = [b'0001000'[i:i+4] for i in range(4)] </code></pre> 示例二-输出为<code>uint8</code>： <pre><code>>>> lookup_2 = np.zeros((alph_as_num.max()+1), dtype=np.uint8) >>> lookup_2[alph_as_num] = 1 << np.arange(4) </code></pre> 示例三-输出为每个字母四<code>uint8</code>： <pre><code>>>> lookup_3 = np.zeros((alph_as_num.max()+1, 4), dtype=np.uint8) >>> lookup_3[alph_as_num[::-1]] = np.identity(4) </code></pre> 现在让我们将其应用于<code>100</code>字母序列： <pre><code>>>> seq array(['CATTTCTCCACCATTTTGGTTTTTCATTGATCCGTTAGGTGGAGCCGGACTATGTCTACCGAAAGATGCACCTGCGCCGGGTCTGGTCTATCTCTTAATG'], dtype='<U100') </code></pre> 因为它只依赖于 <ul> <li>numpy内置的高级索引它使我们可以非常快速地查找（例如，比Python字典快得多）</li> <li>视图转换这基本上是免费的，因为它所做的只是重新解释数据缓冲区（没有任何复制或转换）</li> </ul> 示例一-bytestrings： <pre><code>>>> lookup_1[seq.view(np.int32)] array([b'0010', b'0001', b'1000', b'1000', b'1000', b'0010', b'1000', b'0010', b'0010', b'0001', b'0010', b'0010', b'0001', b'1000', b'1000', b'1000', b'1000', b'0100', b'0100', b'1000', b'1000', b'1000', b'1000', b'1000', b'0010', b'0001', b'1000', b'1000', b'0100', b'0001', b'1000', b'0010', b'0010', b'0100', b'1000', b'1000', b'0001', b'0100', b'0100', b'1000', b'0100', b'0100', b'0001', b'0100', b'0010', b'0010', b'0100', b'0100', b'0001', b'0010', b'1000', b'0001', b'1000', b'0100', b'1000', b'0010', b'1000', b'0001', b'0010', b'0010', b'0100', b'0001', b'0001', b'0001', b'0100', b'0001', b'1000', b'0100', b'0010', b'0001', b'0010', b'0010', b'1000', b'0100', b'0010', b'0100', b'0010', b'0010', b'0100', b'0100', b'0100', b'1000', b'0010', b'1000', b'0100', b'0100', b'1000', b'0010', b'1000', b'0001', b'1000', b'0010', b'1000', b'0010', b'1000', b'1000', b'0001', b'0001', b'1000', b'0100'], dtype='|S4') </code></pre> 作为偏好，这些也可以被视为一个长序列： <pre><code>>>> lookup_1[seq.view(np.int32)].view('S400') array([b'0010000110001000100000101000001000100001001000100001100010001000100001000100100010001000100010000010000110001000010000011000001000100100100010000001010001001000010001000001010000100010010001000001001010000001100001001000001010000001001000100100000100010001010000011000010000100001001000101000010000100100001000100100010001001000001010000100010010000010100000011000001010000010100010000001000110000100'], dtype='|S400') </code></pre> 例二-<code>uint8</code>： <pre><code>>>> lookup_2[seq.view(np.int32)] array([2, 1, 8, 8, 8, 2, 8, 2, 2, 1, 2, 2, 1, 8, 8, 8, 8, 4, 4, 8, 8, 8, 8, 8, 2, 1, 8, 8, 4, 1, 8, 2, 2, 4, 8, 8, 1, 4, 4, 8, 4, 4, 1, 4, 2, 2, 4, 4, 1, 2, 8, 1, 8, 4, 8, 2, 8, 1, 2, 2, 4, 1, 1, 1, 4, 1, 8, 4, 2, 1, 2, 2, 8, 4, 2, 4, 2, 2, 4, 4, 4, 8, 2, 8, 4, 4, 8, 2, 8, 1, 8, 2, 8, 2, 8, 8, 1, 1, 8, 4], dtype=uint8) </code></pre> 示例3-每个字母有四个<code>uint8</code>；但是让我们使用一个不同的<code>seq</code>来处理多行： <pre><code>>>> seq array([['CCCT'], ['GCGA']], dtype='<U4') >>> lookup_3[seq.view(np.int32)].reshape(len(seq), -1) array([[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0], [0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1]], dtype=uint8) </code></pre>

将字符串的numpy转换成numpython字符

1 个回答

相关Python问题