擅长:python、mysql、java
<p>当然,DSM和CTZhu给出了非常简洁的答案,它们利用了Python的许多内置特性,尤其是dataframe。这里有点…[咳嗽]-冗长</p>
<pre><code>def myJoiner(row):
newrow = []
for r in row:
if not pandas.isnull(r):
newrow.append(str(r))
return ';'.join(newrow)
def groupCols(df, key):
columns = df.select(lambda col: key in col, axis=1)
joined = columns.apply(myJoiner, axis=1)
joined.name = key
return pandas.DataFrame(joined)
import pandas
from io import StringIO # python 3.X
#from StringIO import StringIO #python 2.X
data = StringIO("""\
ID Name a a a b b
1 test1 1 NaN NaN "a" NaN
2 test2 NaN 2 NaN "a" NaN
3 test3 2 3 NaN NaN "b"
4 test4 NaN NaN 4 NaN "b"
""")
df = pandas.read_table(data, sep='\s+')
df.set_index(['ID', 'Name'], inplace=True)
AB = groupCols(df, 'a').join(groupCols(df, 'b'))
print(AB)
</code></pre>
<p>这给了我:</p>
<pre><code> a b
ID Name
1 test1 1.0 a
2 test2 2.0 a
3 test3 2.0;3.0 b
4 test4 4.0 b
</code></pre>