我有一个小样本数据集:
import pandas as pd
df = {'ID': ['H576','H577','H577','H578','H600', 'H700', 'H700'],
'CD': ['AAAAAAA', 'BBBBB', 'CCCCCC','DDDDDD', 'EEEEEEE','FFFFFFF','GGGGGGG']}
df = pd.DataFrame(df)
它看起来像:
df
Out[9]:
CD ID
0 AAAAAAA H576
1 BBBBB H577
2 CCCCCC H577
3 DDDDDD H578
4 EEEEEEE H600
5 FFFFFFF H700
6 GGGGGGG H700
对于每个具有多个CD值的ID,我想将它们保存到一个单独的文件中
我的愿望输出文件:
H577.txt文件
CD ID
BBBBB H577
CCCCCC H577
H700.txt
CD ID
FFFFFFF H700
GGGGGGG H700
我的尝试:
import pandas as pd
df = {'ID': ['H576','H577','H577','H578','H600', 'H700', 'H700'],
'CD': ['AAAAAAA', 'BBBBB', 'CCCCCC','DDDDDD', 'EEEEEEE','FFFFFFF','GGGGGGG']}
df = pd.DataFrame(df)
df1 = (df.groupby('ID').filter(lambda x: ('if CD has more than one value for the same ID'.any())))
df1.groupby('ID').apply(lambda gp: gp.to_csv('ID{}.txt'.format(gp.name), sep='\t', index=False))
我不知道如何编码'如果CD有一个以上的同一ID值'部分
试试这个:
输出
IDH577.txt文件:
IDH700.txt文件:
相关问题 更多 >
编程相关推荐