<p>将<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html" rel="nofollow noreferrer">^{<cd1>}</a>用于连接列,将<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unstack.html" rel="nofollow noreferrer">^{<cd2>}</a>用于重塑:</p>
<pre><code>df = df.set_index(['id', df['tp'] + df['dt'].astype(str)])['amt'].unstack().reset_index()
print (df)
id CR2017 CR2018 DR2017 DR2018
0 1 94678.0 13508.0 78671.0 13797.0
1 2 111417.0 21479.0 95266.0 1864.0
</code></pre>
<p>或创建新列:</p>
<pre><code>df['new'] = df['tp'] + df['dt'].astype(str)
df = df.set_index(['id', 'new'])['amt'].unstack().rename_axis(None, axis=1).reset_index()
print (df)
id CR2017 CR2018 DR2017 DR2018
0 1 94678.0 13508.0 78671.0 13797.0
1 2 111417.0 21479.0 95266.0 1864.0
</code></pre>
<p>但如果得到:</p>
<blockquote>
<p>ValueError: Index contains duplicate entries, cannot reshape</p>
</blockquote>
<p>这意味着有重复的<code>id</code>具有如下joine对:</p>
<pre><code>print (df)
id tp dt amt
0 1 CR 2017 94678.0 <-dupe 1 CR 2017
0 1 CR 2017 10000.0 <-dupe 1 CR 2017
1 1 CR 2018 13508.0
2 1 DR 2017 78671.0
3 1 DR 2018 13797.0
4 2 CR 2017 111417.0
5 2 CR 2018 21479.0
6 2 DR 2017 95266.0
7 2 DR 2018 1864.0
</code></pre>
<p>解决方案是由<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html" rel="nofollow noreferrer">^{<cd4>}</a>+聚合函数,如<code>mean</code>、<code>sum</code>和<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unstack.html" rel="nofollow noreferrer">^{<cd2>}</a>:</p>
<pre><code>df = df.groupby(['id', df['tp'] + df['dt'].astype(str)])['amt'].mean().unstack().reset_index()
</code></pre>
<p>或<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot_table.html" rel="nofollow noreferrer">^{<cd8>}</a>默认<code>aggfunc='mean'</code>:</p>
<pre><code>df = df.pivot_table(index='id',columns=df['tp'] + df['dt'].astype(str), values= 'amt').reset_index()
</code></pre>