我有一个熊猫数据框
RTYPE PERIOD_ID STORE_ID MKT MTYPE RGROUP RZF RXF
0 MKT 20171411 3102300001 PM KA+PM PROV+SMKT+PETRO CELL NaN NaN NaN
1 MKT 20171411 3102300002 PM KA+PM PROV+SMKT+PETRO CELL NaN NaN NaN
2 MKT 20171411 3104001193 PM Provision CELL NaN NaN NaN
3 MKT 20171411 3104001193 PM KA+PM PROV+SMKT+PETRO CELL NaN NaN NaN
4 MKT 20171411 3104001193 Provision including MM CELL NaN NaN NaN
5 MKT 20171411 3104001641 PM Provision CELL NaN NaN NaN
6 MKT 20171411 3104001641 PM KA+PM PROV+SMKT+PETRO CELL NaN NaN NaN
7 MKT 20171411 3104001641 Provision including MM CELL NaN NaN NaN
8 MKT 20171411 3104001682 PM Provision CELL NaN NaN NaN
9 MKT 20171411 3104001682 PM KA+PM PROV+SMKT+PETRO CELL NaN NaN NaN
10 MKT 20171411 3104001682 Provision including MM CELL NaN NaN NaN
11 MKT 20171412 3104001682 Alcohol CELL NaN NaN NaN
12 MKT 20171412 3104001682 Fish CELL NaN NaN NaN
13 MKT 20171412 3104001684 Alcohol CELL NaN NaN NaN
14 MKT 20171412 3104001684 Fish CELL NaN NaN NaN
我需要根据这个条件找到MKT的复制品, 如果存储id的集合在特定时间段\u id中与MKTs完全相同,则这些MKTs是重复的。 所以在这种情况下 期间20171411,副本为PM准备金和准备金,包括MM,以及 在20171412期间,复制品是酒精和鱼。你知道吗
我已经试过这个了现在:-你知道吗
df1 = newdf[newdf.duplicated(['PERIOD_ID','STORE_ID'], keep=False)]
d1 = {k:tuple(set(v)) for k, v in df1.groupby('PERIOD_ID')['MKT']}
print (d1)
哪个是返回:-你知道吗
{20171411L: ('Provision including MM', 'PM Provision', 'PM KA+PM PROV+SMKT+PETRO'), 20171412L: ('Fish', 'Alcohol')}
上面的输出不是返回重复的,而是只返回该时段的唯一mkt集。你知道吗
我需要的是这样的东西,我把周期作为键,把那个周期的mkt作为值。作为复制品的条件在上文的帖子中提到-
{20171411L: ('Provision including MM', 'PM Provision'), 20171412L: ('Fish', 'Alcohol')}
我对熊猫真的很陌生,对Python有一些基本的了解。 任何帮助都会很好。你知道吗
这对你的情况应该有用。我刚从你找到的重复的MKT中删除了唯一的MKT。你知道吗
我希望我能正确地理解你,如果我忘了什么或没有正确理解,请随意评论。你知道吗
我可以用下面的代码来解决这个问题
相关问题 更多 >
编程相关推荐