我有一个多索引数据帧,如下所示:
df = {'Modality': {('0020413', '1', '6/21/2017', 'DTI'): 1,
('0020413', '1', '6/21/2017', 'FLAIR'): 1,
('0020413', '1', '6/21/2017', 'T1'): 1,
('0020413', '3', '8/27/2019', 'DTI'): 1,
('0020413', '3', '8/27/2019', 'FLAIR'): 1,
('0020413', '3', '8/27/2019', 'T1'): 1,
('0021261', '1', '3/15/2017', 'DTI'): 1,
('0021261', '1', '3/15/2017', 'FLAIR'): 1,
('0021261', '1', '3/15/2017', 'T1'): 1,
('0021261', '2', '4/24/2018', 'DTI'): 1,
('0021261', '2', '4/24/2018', 'FLAIR'): 1,
('0021261', '2', '4/24/2018', 'T1'): 1,
('0021261', '3', '5/01/2019', 'DTI'): 1,
('0021261', '3', '5/01/2019', 'FLAIR'): 1,
('0021261', '3', '5/01/2019', 'T1'): 1},
'Phase': {('0020413', '1', '6/21/2017', 'DTI'): 1,
('0020413', '1', '6/21/2017', 'FLAIR'): 1,
('0020413', '1', '6/21/2017', 'T1'): 1,
('0020413', '3', '8/27/2019', 'DTI'): 1,
('0020413', '3', '8/27/2019', 'FLAIR'): 1,
('0020413', '3', '8/27/2019', 'T1'): 1,
('0021261', '1', '3/15/2017', 'DTI'): 1,
('0021261', '1', '3/15/2017', 'FLAIR'): 1,
('0021261', '1', '3/15/2017', 'T1'): 1,
('0021261', '2', '4/24/2018', 'DTI'): 1,
('0021261', '2', '4/24/2018', 'FLAIR'): 1,
('0021261', '2', '4/24/2018', 'T1'): 1,
('0021261', '3', '5/01/2019', 'DTI'): 1,
('0021261', '3', '5/01/2019', 'FLAIR'): 1,
('0021261', '3', '5/01/2019', 'T1'): 1}}
我一直试图在level_3列中删除一些重复的值,但它没有出现在我的数据示例中,因为它非常庞大,并且我无法获得重复值的特定行,但有时对于每个“level_0”,在“level_3”中有三个以上的值。这些值是重复的,例如,您可以为单个“级别0”找到“DTI、FLAIR、FLAIR、T1、T1”
我一直在努力:
df = df.drop_duplicates(subset = 'Description', keep = "first")
但我有一个错误:
KeyError: Index(['Description'], dtype='object')
我相信这是因为数据帧是多索引的,但是我在多索引数据帧中找不到关于删除重复项的信息
你能帮我吗
IIUC
尝试:
现在,如果您打印
out
,您将获得预期的输出相关问题 更多 >
编程相关推荐