回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我有一个下面提到的dataframe,它是使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html" rel="nofollow noreferrer">^{<cd1>}</a>传递参数<code>agg_func</code>作为<code>sum</code>旋转dataframe的结果:</p>
<pre class="lang-py prettyprint-override"><code> counts
RACE BLACK OR AFRICAN AMERICAN WHITE All
ETHNIC
HISPANIC OR LATINO 11 41 52
NOT HISPANIC OR LATINO 15 71 86
All 26 112 138
</code></pre>
<p>您可以运行下面的代码将上面的数据帧加载到变量<code>df</code>:</p>
<pre class="lang-py prettyprint-override"><code>df = pd.DataFrame.from_dict({('counts', 'BLACK OR AFRICAN AMERICAN'): {'HISPANIC OR LATINO': 11, 'NOT HISPANIC OR LATINO': 15, 'All': 26}, ('counts', 'WHITE'): {'HISPANIC OR LATINO': 41, 'NOT HISPANIC OR LATINO': 71, 'All': 112}, ('counts', 'All'): {'HISPANIC OR LATINO': 52, 'NOT HISPANIC OR LATINO': 86, 'All': 138}}).rename_axis((None, 'RACE'), axis=1).rename_axis((['ETHNIC']), axis=0)
</code></pre>
<p>我正在尝试转换此数据帧,如下所述:</p>
<p><strong>预期输出:</strong></p>
<pre class="lang-py prettyprint-override"><code> level varName value counts
0 2 ETHNIC HISPANIC OR LATINO 52
1 1 RACE BLACK OR AFRICAN AMERICAN 11
2 1 RACE WHITE 41
3 2 ETHNIC NOT HISPANIC OR LATINO 86
4 1 RACE BLACK OR AFRICAN AMERICAN 15
5 1 RACE WHITE 71
</code></pre>
<p>上面的<code>varName</code>字段表示列和行的轴名称,<code>level</code>是1表示列轴,2表示行轴,<em>具有索引值的行<code>All</code>是可选的,因此不管它是否在结果数据帧中(后来我认为如果它在那里会更好)</em></p>
<p>我已经研究了以下这么多线程,但我没有发现它们中的任何一个与我的问题有多大关系</p>
<ul>
<li><a href="https://stackoverflow.com/questions/28654047/pandas-convert-some-columns-into-rows">pandas convert some columns into rows</a></li>
<li><a href="https://stackoverflow.com/questions/38951345/how-to-get-rid-of-multilevel-index-after-using-pivot-table-pandas">How to get rid of multilevel index after using pivot table
pandas?</a></li>
<li><a href="https://stackoverflow.com/questions/54355131/converting-pandas-multiindex-column-to-row">Converting Pandas MultiIndex column to row</a></li>
<li><a href="https://stackoverflow.com/questions/39273441/flatten-pandas-pivot-table">Flatten Pandas Pivot Table</a></li>
</ul>
<p>我能够转换的一种方法是手动拾取值并手动创建所需的数据帧,如下所示:</p>
<pre class="lang-py prettyprint-override"><code>df.index.names
# output: FrozenList(['ETHNIC'])
df.columns.names
#output: FrozenList([None, 'RACE'])
[y for x,y in df][:-1]
#output: ['BLACK OR AFRICAN AMERICAN', 'WHITE']
[x for x in df.index][:-1]
#output: ['HISPANIC OR LATINO', 'NOT HISPANIC OR LATINO']
</code></pre>
<p><strong>编辑</strong>:</p>
<p>以下是创建透视表之前的数据:</p>
<pre class="lang-py prettyprint-override"><code>data = pd.DataFrame.from_dict({'ETHNIC': {0: 'NOT HISPANIC OR LATINO', 1: 'NOT HISPANIC OR LATINO', 2: 'NOT HISPANIC OR LATINO', 3: 'NOT HISPANIC OR LATINO', 4: 'NOT HISPANIC OR LATINO', 5: 'NOT HISPANIC OR LATINO', 6: 'NOT HISPANIC OR LATINO', 7: 'NOT HISPANIC OR LATINO', 8: 'NOT HISPANIC OR LATINO', 9: 'HISPANIC OR LATINO', 10: 'HISPANIC OR LATINO', 11: 'HISPANIC OR LATINO', 12: 'HISPANIC OR LATINO', 13: 'HISPANIC OR LATINO', 14: 'NOT HISPANIC OR LATINO', 15: 'NOT HISPANIC OR LATINO', 16: 'NOT HISPANIC OR LATINO', 17: 'NOT HISPANIC OR LATINO', 18: 'NOT HISPANIC OR LATINO', 19: 'NOT HISPANIC OR LATINO', 20: 'NOT HISPANIC OR LATINO', 21: 'HISPANIC OR LATINO', 22: 'HISPANIC OR LATINO', 23: 'NOT HISPANIC OR LATINO', 24: 'NOT HISPANIC OR LATINO', 25: 'NOT HISPANIC OR LATINO', 26: 'HISPANIC OR LATINO', 27: 'HISPANIC OR LATINO', 28: 'HISPANIC OR LATINO', 29: 'HISPANIC OR LATINO', 30: 'HISPANIC OR LATINO', 31: 'HISPANIC OR LATINO', 32: 'NOT HISPANIC OR LATINO', 33: 'HISPANIC OR LATINO', 34: 'NOT HISPANIC OR LATINO', 35: 'NOT HISPANIC OR LATINO', 36: 'NOT HISPANIC OR LATINO', 37: 'NOT HISPANIC OR LATINO', 38: 'NOT HISPANIC OR LATINO', 39: 'NOT HISPANIC OR LATINO', 40: 'NOT HISPANIC OR LATINO', 41: 'NOT HISPANIC OR LATINO', 42: 'HISPANIC OR LATINO', 43: 'NOT HISPANIC OR LATINO', 44: 'NOT HISPANIC OR LATINO', 45: 'NOT HISPANIC OR LATINO', 46: 'HISPANIC OR LATINO', 47: 'HISPANIC OR LATINO', 48: 'HISPANIC OR LATINO', 49: 'HISPANIC OR LATINO', 50: 'NOT HISPANIC OR LATINO', 51: 'NOT HISPANIC OR LATINO', 52: 'NOT HISPANIC OR LATINO', 53: 'HISPANIC OR LATINO', 54: 'HISPANIC OR LATINO', 55: 'HISPANIC OR LATINO', 56: 'NOT HISPANIC OR LATINO', 57: 'HISPANIC OR LATINO', 58: 'HISPANIC OR LATINO', 59: 'NOT HISPANIC OR LATINO', 60: 'NOT HISPANIC OR LATINO', 61: 'HISPANIC OR LATINO', 62: 'HISPANIC OR LATINO', 63: 'HISPANIC OR LATINO', 64: 'HISPANIC OR LATINO', 65: 'NOT HISPANIC OR LATINO', 66: 'NOT HISPANIC OR LATINO', 67: 'NOT HISPANIC OR LATINO', 68: 'NOT HISPANIC OR LATINO', 69: 'HISPANIC OR LATINO', 70: 'NOT HISPANIC OR LATINO', 71: 'NOT HISPANIC OR LATINO', 72: 'HISPANIC OR LATINO', 73: 'HISPANIC OR LATINO', 74: 'HISPANIC OR LATINO', 75: 'NOT HISPANIC OR LATINO', 76: 'NOT HISPANIC OR LATINO', 77: 'NOT HISPANIC OR LATINO', 78: 'NOT HISPANIC OR LATINO', 79: 'NOT HISPANIC OR LATINO', 80: 'NOT HISPANIC OR LATINO', 81: 'NOT HISPANIC OR LATINO', 82: 'HISPANIC OR LATINO', 83: 'HISPANIC OR LATINO', 84: 'HISPANIC OR LATINO', 85: 'NOT HISPANIC OR LATINO', 86: 'HISPANIC OR LATINO', 87: 'HISPANIC OR LATINO', 88: 'HISPANIC OR LATINO', 89: 'NOT HISPANIC OR LATINO', 90: 'NOT HISPANIC OR LATINO', 91: 'NOT HISPANIC OR LATINO', 92: 'NOT HISPANIC OR LATINO', 93: 'NOT HISPANIC OR LATINO', 94: 'NOT HISPANIC OR LATINO', 95: 'HISPANIC OR LATINO', 96: 'HISPANIC OR LATINO', 97: 'HISPANIC OR LATINO', 98: 'NOT HISPANIC OR LATINO', 99: 'NOT HISPANIC OR LATINO', 100: 'NOT HISPANIC OR LATINO', 101: 'NOT HISPANIC OR LATINO', 102: 'NOT HISPANIC OR LATINO', 103: 'NOT HISPANIC OR LATINO', 104: 'NOT HISPANIC OR LATINO', 105: 'NOT HISPANIC OR LATINO', 106: 'NOT HISPANIC OR LATINO', 107: 'NOT HISPANIC OR LATINO', 108: 'NOT HISPANIC OR LATINO', 109: 'HISPANIC OR LATINO', 110: 'HISPANIC OR LATINO', 111: 'NOT HISPANIC OR LATINO', 112: 'NOT HISPANIC OR LATINO', 113: 'NOT HISPANIC OR LATINO', 114: 'NOT HISPANIC OR LATINO', 115: 'HISPANIC OR LATINO', 116: 'HISPANIC OR LATINO', 117: 'NOT HISPANIC OR LATINO', 118: 'HISPANIC OR LATINO', 119: 'HISPANIC OR LATINO', 120: 'NOT HISPANIC OR LATINO', 121: 'HISPANIC OR LATINO', 122: 'HISPANIC OR LATINO', 123: 'HISPANIC OR LATINO', 124: 'HISPANIC OR LATINO', 125: 'HISPANIC OR LATINO', 126: 'NOT HISPANIC OR LATINO', 127: 'NOT HISPANIC OR LATINO', 128: 'NOT HISPANIC OR LATINO', 129: 'NOT HISPANIC OR LATINO', 130: 'NOT HISPANIC OR LATINO', 131: 'NOT HISPANIC OR LATINO', 132: 'NOT HISPANIC OR LATINO', 133: 'NOT HISPANIC OR LATINO', 134: 'NOT HISPANIC OR LATINO', 135: 'NOT HISPANIC OR LATINO', 136: 'NOT HISPANIC OR LATINO', 137: 'NOT HISPANIC OR LATINO'}, 'RACE': {0: 'WHITE', 1: 'WHITE', 2: 'WHITE', 3: 'WHITE', 4: 'WHITE', 5: 'WHITE', 6: 'WHITE', 7: 'WHITE', 8: 'WHITE', 9: 'BLACK OR AFRICAN AMERICAN', 10: 'BLACK OR AFRICAN AMERICAN', 11: 'BLACK OR AFRICAN AMERICAN', 12: 'BLACK OR AFRICAN AMERICAN', 13: 'BLACK OR AFRICAN AMERICAN', 14: 'WHITE', 15: 'WHITE', 16: 'WHITE', 17: 'WHITE', 18: 'WHITE', 19: 'WHITE', 20: 'BLACK OR AFRICAN AMERICAN', 21: 'WHITE', 22: 'WHITE', 23: 'WHITE', 24: 'BLACK OR AFRICAN AMERICAN', 25: 'BLACK OR AFRICAN AMERICAN', 26: 'WHITE', 27: 'WHITE', 28: 'WHITE', 29: 'WHITE', 30: 'WHITE', 31: 'WHITE', 32: 'WHITE', 33: 'WHITE', 34: 'WHITE', 35: 'WHITE', 36: 'WHITE', 37: 'WHITE', 38: 'WHITE', 39: 'WHITE', 40: 'BLACK OR AFRICAN AMERICAN', 41: 'BLACK OR AFRICAN AMERICAN', 42: 'WHITE', 43: 'WHITE', 44: 'WHITE', 45: 'WHITE', 46: 'WHITE', 47: 'WHITE', 48: 'WHITE', 49: 'WHITE', 50: 'WHITE', 51: 'BLACK OR AFRICAN AMERICAN', 52: 'BLACK OR AFRICAN AMERICAN', 53: 'WHITE', 54: 'WHITE', 55: 'WHITE', 56: 'WHITE', 57: 'WHITE', 58: 'WHITE', 59: 'WHITE', 60: 'WHITE', 61: 'WHITE', 62: 'WHITE', 63: 'WHITE', 64: 'WHITE', 65: 'BLACK OR AFRICAN AMERICAN', 66: 'BLACK OR AFRICAN AMERICAN', 67: 'BLACK OR AFRICAN AMERICAN', 68: 'BLACK OR AFRICAN AMERICAN', 69: 'WHITE', 70: 'WHITE', 71: 'WHITE', 72: 'WHITE', 73: 'WHITE', 74: 'BLACK OR AFRICAN AMERICAN', 75: 'WHITE', 76: 'WHITE', 77: 'WHITE', 78: 'WHITE', 79: 'WHITE', 80: 'BLACK OR AFRICAN AMERICAN', 81: 'BLACK OR AFRICAN AMERICAN', 82: 'BLACK OR AFRICAN AMERICAN', 83: 'BLACK OR AFRICAN AMERICAN', 84: 'BLACK OR AFRICAN AMERICAN', 85: 'BLACK OR AFRICAN AMERICAN', 86: 'WHITE', 87: 'WHITE', 88: 'WHITE', 89: 'WHITE', 90: 'WHITE', 91: 'WHITE', 92: 'WHITE', 93: 'WHITE', 94: 'WHITE', 95: 'WHITE', 96: 'WHITE', 97: 'WHITE', 98: 'WHITE', 99: 'WHITE', 100: 'WHITE', 101: 'WHITE', 102: 'WHITE', 103: 'WHITE', 104: 'WHITE', 105: 'WHITE', 106: 'WHITE', 107: 'WHITE', 108: 'BLACK OR AFRICAN AMERICAN', 109: 'WHITE', 110: 'WHITE', 111: 'WHITE', 112: 'WHITE', 113: 'WHITE', 114: 'WHITE', 115: 'BLACK OR AFRICAN AMERICAN', 116: 'BLACK OR AFRICAN AMERICAN', 117: 'WHITE', 118: 'WHITE', 119: 'WHITE', 120: 'WHITE', 121: 'WHITE', 122: 'WHITE', 123: 'WHITE', 124: 'WHITE', 125: 'WHITE', 126: 'WHITE', 127: 'WHITE', 128: 'WHITE', 129: 'WHITE', 130: 'WHITE', 131: 'WHITE', 132: 'WHITE', 133: 'WHITE', 134: 'WHITE', 135: 'WHITE', 136: 'WHITE', 137: 'WHITE'}})
</code></pre>
<p>以下是旋转的代码:</p>
<pre class="lang-py prettyprint-override"><code>df = (data.groupby(['ETHNIC', 'RACE'])
.size()
.to_frame('counts')
.reset_index(level=['ETHNIC', 'RACE'])
.pivot_table(index='ETHNIC', columns='RACE', aggfunc='sum', margins=True, dropna=False)
)
</code></pre>
<p><strong>PS</strong>:<em>请注意,预期数据帧中的行顺序很重要</em></p>
<p><strong>更新</strong>:</p>
<p>正如在评论中所建议的,我尝试了<code>pd.crosstab</code>,发现它是
创建与我使用<code>pd.pivot_table</code>创建的聚合<code>df</code>相同的聚合<code>df</code>(在一个有200K行的数据帧上测试)的速度几乎慢了2倍</p>