擅长:python、mysql、java
<p>前面的答案已经过时了,所以这里有一个将字符串映射到数字的解决方案,它适用于0.18.1版的Pandas。</p>
<p>对于一个系列:</p>
<pre><code>In [1]: import pandas as pd
In [2]: s = pd.Series(['single', 'touching', 'nuclei', 'dusts',
'touching', 'single', 'nuclei'])
In [3]: s_enc = pd.factorize(s)
In [4]: s_enc[0]
Out[4]: array([0, 1, 2, 3, 1, 0, 2])
In [5]: s_enc[1]
Out[5]: Index([u'single', u'touching', u'nuclei', u'dusts'], dtype='object')
</code></pre>
<p>对于数据帧:</p>
<pre><code>In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'labels': ['single', 'touching', 'nuclei',
'dusts', 'touching', 'single', 'nuclei']})
In [3]: catenc = pd.factorize(df['labels'])
In [4]: catenc
Out[4]: (array([0, 1, 2, 3, 1, 0, 2]),
Index([u'single', u'touching', u'nuclei', u'dusts'],
dtype='object'))
In [5]: df['labels_enc'] = catenc[0]
In [6]: df
Out[4]:
labels labels_enc
0 single 0
1 touching 1
2 nuclei 2
3 dusts 3
4 touching 1
5 single 0
6 nuclei 2
</code></pre>