回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>以下是我的示例数据:</p>
<pre><code>{'Rhesus': {('count', u'augCGP,transMap'): 6.0, ('count', u'augTM,transMap'): 11563.0, ('count', u'transMap'): 39930.0, ('count', u'augTM'): 5114.0, ('count', u'augCGP,augTM,augTMR,transMap'): 27.0, ('count', u'augCGP,augTMR'): 1.0, ('count', u'augTMR,transMap'): 145.0, ('count', u'augTMR'): 4217.0, ('count', u'augCGP,augTMR,transMap'): nan, ('count', u'augCGP,augTM,augTMR'): nan, ('count', u'augCGP'): 4239.0, ('count', u'augCGP,augTM,transMap'): 3.0, ('count', u'augTM,augTMR,transMap'): 6296.0, ('count', u'augTM,augTMR'): 3357.0}, 'Susie': {('count', u'augCGP,transMap'): 11.0, ('count', u'augTM,transMap'): 10821.0, ('count', u'transMap'): 41300.0, ('count', u'augTM'): 2894.0, ('count', u'augCGP,augTM,augTMR,transMap'): 43.0, ('count', u'augCGP,augTMR'): nan, ('count', u'augTMR,transMap'): 353.0, ('count', u'augTMR'): 5399.0, ('count', u'augCGP,augTMR,transMap'): 1.0, ('count', u'augCGP,augTM,augTMR'): 1.0, ('count', u'augCGP'): 2740.0, ('count', u'augCGP,augTM,transMap'): 2.0, ('count', u'augTM,augTMR,transMap'): 10196.0, ('count', u'augTM,augTMR'): 2789.0}, 'Clint': {('count', u'augCGP,transMap'): 16.0, ('count', u'augTM,transMap'): 17341.0, ('count', u'transMap'): 39284.0, ('count', u'augTM'): 2888.0, ('count', u'augCGP,augTM,augTMR,transMap'): 80.0, ('count', u'augCGP,augTMR'): 1.0, ('count', u'augTMR,transMap'): 144.0, ('count', u'augTMR'): 2881.0, ('count', u'augCGP,augTMR,transMap'): nan, ('count', u'augCGP,augTM,augTMR'): 1.0, ('count', u'augCGP'): 2338.0, ('count', u'augCGP,augTM,transMap'): 8.0, ('count', u'augTM,augTMR,transMap'): 8725.0, ('count', u'augTM,augTMR'): 1441.0}, 'Orangutan': {('count', u'augCGP,transMap'): 7.0, ('count', u'augTM,transMap'): 6568.0, ('count', u'transMap'): 46113.0, ('count', u'augTM'): 3656.0, ('count', u'augCGP,augTM,augTMR,transMap'): 17.0, ('count', u'augCGP,augTMR'): nan, ('count', u'augTMR,transMap'): 284.0, ('count', u'augTMR'): 5952.0, ('count', u'augCGP,augTMR,transMap'): 1.0, ('count', u'augCGP,augTM,augTMR'): 1.0, ('count', u'augCGP'): 5753.0, ('count', u'augCGP,augTM,transMap'): 3.0, ('count', u'augTM,augTMR,transMap'): 6567.0, ('count', u'augTM,augTMR'): 3520.0}, 'Gibbon': {('count', u'augCGP,transMap'): 5.0, ('count', u'augTM,transMap'): 6828.0, ('count', u'transMap'): 44285.0, ('count', u'augTM'): 4313.0, ('count', u'augCGP,augTM,augTMR,transMap'): 16.0, ('count', u'augCGP,augTMR'): nan, ('count', u'augTMR,transMap'): 187.0, ('count', u'augTMR'): 6550.0, ('count', u'augCGP,augTMR,transMap'): nan, ('count', u'augCGP,augTM,augTMR'): 1.0, ('count', u'augCGP'): 4178.0, ('count', u'augCGP,augTM,transMap'): nan, ('count', u'augTM,augTMR,transMap'): 5839.0, ('count', u'augTM,augTMR'): 3882.0}}
</code></pre>
<p>这是一个数据帧,看起来像:</p>
<pre><code>>>> df
genome Clint Gibbon Orangutan Rhesus \
Transcript Modes
count augCGP 2338.0 4178.0 5753.0 4239.0
augCGP,augTM,augTMR 1.0 1.0 1.0 NaN
augCGP,augTM,augTMR,transMap 80.0 16.0 17.0 27.0
augCGP,augTM,transMap 8.0 NaN 3.0 3.0
augCGP,augTMR 1.0 NaN NaN 1.0
augCGP,augTMR,transMap NaN NaN 1.0 NaN
augCGP,transMap 16.0 5.0 7.0 6.0
augTM 2888.0 4313.0 3656.0 5114.0
augTM,augTMR 1441.0 3882.0 3520.0 3357.0
augTM,augTMR,transMap 8725.0 5839.0 6567.0 6296.0
augTM,transMap 17341.0 6828.0 6568.0 11563.0
augTMR 2881.0 6550.0 5952.0 4217.0
augTMR,transMap 144.0 187.0 284.0 145.0
transMap 39284.0 44285.0 46113.0 39930.0
genome Susie
Transcript Modes
count augCGP 2740.0
augCGP,augTM,augTMR 1.0
augCGP,augTM,augTMR,transMap 43.0
augCGP,augTM,transMap 2.0
augCGP,augTMR NaN
augCGP,augTMR,transMap 1.0
augCGP,transMap 11.0
augTM 2894.0
augTM,augTMR 2789.0
augTM,augTMR,transMap 10196.0
augTM,transMap 10821.0
augTMR 5399.0
augTMR,transMap 353.0
transMap 41300.0
</code></pre>
<p>如您所见,其中一些类别的条目非常少。我想对每一行(<code>Transcript Modes</code>)进行过滤,这样如果它们在每一列的总数中所占的比例小于1%,就会删除它们。所以,我得到的数据帧看起来像:</p>
<pre><code>>>> df
genome Clint Gibbon Orangutan Rhesus \
Transcript Modes
count augCGP 2338.0 4178.0 5753.0 4239.0
augTM 2888.0 4313.0 3656.0 5114.0
augTM,augTMR 1441.0 3882.0 3520.0 3357.0
augTM,augTMR,transMap 8725.0 5839.0 6567.0 6296.0
augTM,transMap 17341.0 6828.0 6568.0 11563.0
augTMR 2881.0 6550.0 5952.0 4217.0
transMap 39284.0 44285.0 46113.0 39930.0
genome Susie
Transcript Modes
count augCGP 2740.0
augTM 2894.0
augTM,augTMR 2789.0
augTM,augTMR,transMap 10196.0
augTM,transMap 10821.0
augTMR 5399.0
transMap 41300.0
</code></pre>