我觉得必须有一个非常直截了当的方法来做这件事,但我找不到。在
所以,我有了这个数据(注意,description
列在多个列之间有一个共享部分):
import pandas as pd
data = {"description": ["AAAA:A", "AAAA:B", "AAAA:C", "AAAA:D", "BBBB:A", "BBBB:B"],
"sequence": ["AAAAAAAAAAA", "AAAAAAABBBBBB", "AAAAAAAACCCCCCC", "AAAAAAAADDDDDDD",
"BBBBBBAAAAA", "BBBBBBBBBBBBB"]}
df = pd.DataFrame(data)
print df
# description sequence
#0 AAAA:A AAAAAAAAAAA
#1 AAAA:B AAAAAAABBBBBB
#2 AAAA:C AAAAAAAACCCCCCC
#3 AAAA:D AAAAAAAADDDDDDD
#4 BBBB:A BBBBBBAAAAA
#5 BBBB:B BBBBBBBBBBBBB
我的最终目标是把所有的序列组合起来,形成一个4个字母的描述。像这样:
^{pr2}$直到现在,我已经到了这个地步:
df = df.apply(lambda row: pd.Series({"description": row["description"].split(":")[0],
"sequence_{}".format(row["description"].split(":")[1]): row["sequence"]}),
axis=1)
print df
# description sequence_A sequence_B sequence_C sequence_D
#0 AAAA AAAAAAAAAAA NaN NaN NaN
#1 AAAA NaN AAAAAAABBBBBB NaN NaN
#2 AAAA NaN NaN AAAAAAAACCCCCCC NaN
#3 AAAA NaN NaN NaN AAAAAAAADDDDDDD
#4 BBBB BBBBBBAAAAA NaN NaN NaN
#5 BBBB NaN BBBBBBBBBBBBB NaN NaN
我猜我需要df.groupby("description")
然后再做一步,但是我错过了最后一点。在
split
然后pivot
使用轴
相关问题 更多 >
编程相关推荐