合并两列以消除重复行问题的回答

合并两列以消除重复行

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

将<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html" rel="nofollow noreferrer">^{<cd1>}</a>用于连接列，将<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unstack.html" rel="nofollow noreferrer">^{<cd2>}</a>用于重塑： <pre><code>df = df.set_index(['id', df['tp'] + df['dt'].astype(str)])['amt'].unstack().reset_index() print (df) id CR2017 CR2018 DR2017 DR2018 0 1 94678.0 13508.0 78671.0 13797.0 1 2 111417.0 21479.0 95266.0 1864.0 </code></pre> 或创建新列： <pre><code>df['new'] = df['tp'] + df['dt'].astype(str) df = df.set_index(['id', 'new'])['amt'].unstack().rename_axis(None, axis=1).reset_index() print (df) id CR2017 CR2018 DR2017 DR2018 0 1 94678.0 13508.0 78671.0 13797.0 1 2 111417.0 21479.0 95266.0 1864.0 </code></pre> 但如果得到： <blockquote> ValueError: Index contains duplicate entries, cannot reshape </blockquote> 这意味着有重复的<code>id</code>具有如下joine对： <pre><code>print (df) id tp dt amt 0 1 CR 2017 94678.0 <-dupe 1 CR 2017 0 1 CR 2017 10000.0 <-dupe 1 CR 2017 1 1 CR 2018 13508.0 2 1 DR 2017 78671.0 3 1 DR 2018 13797.0 4 2 CR 2017 111417.0 5 2 CR 2018 21479.0 6 2 DR 2017 95266.0 7 2 DR 2018 1864.0 </code></pre> 解决方案是由<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html" rel="nofollow noreferrer">^{<cd4>}</a>+聚合函数，如<code>mean</code>、<code>sum</code>和<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unstack.html" rel="nofollow noreferrer">^{<cd2>}</a>： <pre><code>df = df.groupby(['id', df['tp'] + df['dt'].astype(str)])['amt'].mean().unstack().reset_index() </code></pre> 或<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot_table.html" rel="nofollow noreferrer">^{<cd8>}</a>默认<code>aggfunc='mean'</code>： <pre><code>df = df.pivot_table(index='id',columns=df['tp'] + df['dt'].astype(str), values= 'amt').reset_index() </code></pre>

合并两列以消除重复行

1 个回答

相关Python问题