擅长:python、mysql、java
<p><code>pandas.qcut</code>将给出分位数,但类似直方图的操作将需要一些<code>numpy</code>技巧,这些技巧在这里很有用:</p>
<pre><code>_, breaks = np.histogram(df.MAT, bins=5)
ax = df.boxplot(column='N0_YLDF', by='Class')
ax.xaxis.set_ticklabels(['%s'%val for i, val in enumerate(breaks) if i in df.Class])
</code></pre>
<p><img src="https://i.stack.imgur.com/WEH6I.png" alt="enter image description here"/></p>
<p>数据帧现在如下所示:</p>
<pre><code> N0_YLDF MAT Class
0 1.29 13.67 1
1 2.32 10.67 0
2 6.24 11.29 1
3 5.34 21.29 1
4 6.35 41.67 2
5 5.35 91.67 5
6 9.32 21.52 1
7 6.32 31.52 2
8 3.33 13.52 1
9 4.56 44.52 3
[10 rows x 3 columns]
</code></pre>
<p>它也可用于获得四分位图:</p>
<pre><code>breaks = np.asarray(np.percentile(df.MAT, [25,50,75,100]))
df['Class'] = (df.MAT.values > breaks[..., np.newaxis]).sum(0)
ax = df.boxplot(column='N0_YLDF', by='Class')
ax.xaxis.set_ticklabels(['%s'%val for val in breaks])
</code></pre>
<p><img src="https://i.stack.imgur.com/pSqvB.png" alt="enter image description here"/></p>