擅长:python、mysql、java
<p>如果没有<code>NaN</code>的拆分值顺序不重要,请在<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.agg.html" rel="nofollow noreferrer">^{<cd3>}</a>中的自定义函数中转换为集合和<code>join</code>:</p>
<pre><code>def f(x):
out = set([z for y in x.dropna() for z in y.split(',')])
return ','.join(out) if bool(out) else np.nan
df = df.groupby(['col B','col D']).agg(f).reset_index().reindex(columns=df.columns)
print (df)
col a col B col c col D col e
0 c,a,b ABC-1 a,c,b ABCD c,b,d
1 a ABC-2 NaN ABCD aaa
2 NaN ABC-3 c AACE c,b
</code></pre>
<p>如果顺序很重要,请使用<code>OrderedDict</code>:</p>
<pre><code>from collections import OrderedDict
def f(x):
out = OrderedDict.fromkeys([z for y in x.dropna() for z in y.split(',')]).keys()
return ','.join(out) if bool(out) else np.nan
df = df.groupby(['col B','col D']).agg(f).reset_index().reindex(columns=df.columns)
print (df)
col a col B col c col D col e
0 a,b,c ABC-1 c,a,b ABCD b,c,d
1 a ABC-2 NaN ABCD aaa
2 NaN ABC-3 c AACE b,c
</code></pre>