<pre><code>import pandas as pd
df = pd.DataFrame({'Audi': ['R8', 'NA', 'RS7', 'NA', 'NA', 'S6', 'A8'],
'Country': ['FR', 'US', 'UK', 'RU', 'US', 'US', 'UK'],
'Cust-id': ['Cu1', 'Cu2', 'Cu3', 'Cu4', 'Cu5', 'Cu6', 'Cu7'],
'Ferrari': ['FF', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA'],
'Jaguar': ['NA', 'XF', 'NA', 'NA', 'Ford', 'F-type', 'XE'],
'Porsche': ['NA', 'NA', 'NA', '911', '918', 'NA', 'MacanS'],
'Sex': ['F', 'M', 'M', 'F', 'M', 'F', 'M']})
result = pd.melt(df, id_vars=['Cust-id', 'Sex', 'Country'])
mask = result['value'] != 'NA'
result = result.loc[mask]
result['index'] = result.index
result = pd.concat([result[['Cust-id', 'Sex', 'Country']],
result.pivot(index='index', columns='variable', values='value')], axis=1)
print(result)
</code></pre>
<p>收益率</p>
<pre><code> Cust-id Sex Country Audi Ferrari Jaguar Porsche
0 Cu1 F FR R8 None None None
2 Cu3 M UK RS7 None None None
5 Cu6 F US S6 None None None
6 Cu7 M UK A8 None None None
7 Cu1 F FR None FF None None
15 Cu2 M US None None XF None
18 Cu5 M US None None Ford None
19 Cu6 F US None None F-type None
20 Cu7 M UK None None XE None
24 Cu4 F RU None None None 911
25 Cu5 M US None None None 918
27 Cu7 M UK None None None MacanS
</code></pre>
<hr/>
<p>可以使用<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html" rel="nofollow">^{<cd1>}</a>将car列合并为一个列:</p>
<pre><code>In [232]: result = pd.melt(df, id_vars=['Cust-id', 'Sex', 'Country']); result.head()
Out[232]:
Cust-id Sex Country variable value
0 Cu1 F FR Audi R8
1 Cu2 M US Audi NA
2 Cu3 M UK Audi RS7
3 Cu4 F RU Audi NA
4 Cu5 M US Audi NA
...
</code></pre>
<p>删除具有<code>'NA'</code>字符串值的行:</p>
<pre><code>mask = result['value'] != 'NA'
result = result.loc[mask]
</code></pre>
<p>然后使用<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html" rel="nofollow">^{<cd3>}</a>来重塑结果。<code>pivot</code>是<a href="http://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-by-melt" rel="nofollow">roughly the inverse of ^{<cd5>}</a>它将一列中的值(例如<code>'variable'</code>)分散到多个列中,从而取消合并car列。你知道吗</p>
<pre><code>result['index'] = result.index
result = pd.concat([result[['Cust-id', 'Sex', 'Country']],
result.pivot(index='index', columns='variable', values='value')], axis=1)
</code></pre>
<p><code>result['index'] = result.index</code>用于确保数据透视按原样保留行。你知道吗</p>