合并数据帧并计算一个数据帧除以另一个数据帧

2024-06-26 13:40:55 发布

您现在位置:Python中文网/ 问答频道 /正文

    concatted  score       date status  apple  banana  orange
0  apple_bana  0.500 2010-02-20   high   True   False   False
1       apple  0.400 2010-02-10   high   True   False   False
2      banana  0.530 2010-01-12   high  False    True   False
3        kiwi  0.532 2010-03-03    low  False   False   False
4        cake  0.634 2010-03-05    low  False   False   False 


fruits = ['apple', 'banana', 'orange']
for fruit in fruits:
    df['fruit'] = df['concatted'].str.contains(fruit, regex=True)
df1=df.groupby('date')['status'].apply(lambda x: (x=='high').sum()).reset_index(name='count')
df2 = df['date'].value_counts().sort_index().reset_index(name='total')

如何将df1和df2添加到原始df并计算df1/df2


Tags: falsetrueappledfdateindexstatuslow
2条回答

数据似乎没有显示任何用于真正测试计算器的功能。假设您希望所有内容都回到原始数据帧中transform()就是您的朋友 同样改变的方式列表用于执行reg exprcontains()检查以消除循环

df = pd.DataFrame({"concatted":["apple_bana","apple","banana","kiwi","cake"],"score":[0.5,0.4,0.53,0.532,0.634],"date":["2010-02-19T16:00:00.000Z","2010-02-09T16:00:00.000Z","2010-01-11T16:00:00.000Z","2010-03-02T16:00:00.000Z","2010-03-04T16:00:00.000Z"],"status":["high","high","high","low","low"],"fruit":[False,False,False,False,False]})

fruits = ['apple', 'banana', 'orange']
df["fruit"] = df["concatted"].str.contains("|".join(fruits))
df["highcalc"] = df.groupby('date')['status'].transform(lambda x: (x=='high').sum())
df["datecount"] = df.groupby('date')["date"].transform("count")
df["finalcalc"] = df.apply(lambda r: r["highcalc"]/r["datecount"], axis=1 )
print(df.to_string(index=False))
df

输出

  concatted  score                      date status  fruit  highcalc  datecount  finalcalc
 apple_bana  0.500  2010-02-19T16:00:00.000Z   high   True         1          1        1.0
      apple  0.400  2010-02-09T16:00:00.000Z   high   True         1          1        1.0
     banana  0.530  2010-01-11T16:00:00.000Z   high   True         1          1        1.0
       kiwi  0.532  2010-03-02T16:00:00.000Z    low  False         0          1        0.0
       cake  0.634  2010-03-04T16:00:00.000Z    low  False         0          1        0.0

补充列以对行进行分类

df = pd.DataFrame({"concatted":["apple_bana","apple","banana","kiwi","cake"],"score":[0.5,0.4,0.53,0.532,0.634],"date":["2010-02-19T16:00:00.000Z","2010-02-09T16:00:00.000Z","2010-01-11T16:00:00.000Z","2010-03-02T16:00:00.000Z","2010-03-04T16:00:00.000Z"],"status":["high","high","high","low","low"]})

df["highcalc"] = df.groupby('date')['status'].transform(lambda x: (x=='high').sum())
df["datecount"] = df.groupby('date')["date"].transform("count")
df["finalcalc"] = df.apply(lambda r: r["highcalc"]/r["datecount"], axis=1 )

dfcat = pd.DataFrame({"concatted":df["concatted"].unique(), "cat":np.NaN, "truth":True})
fruits = ['apple', 'banana', 'orange']
bakery = ["cake"]
dfcat.loc[dfcat["cat"].isna() & dfcat["concatted"].str.contains("|".join(fruits)), "cat"] = "fruit"
dfcat.loc[dfcat["cat"].isna() & dfcat["concatted"].str.contains("|".join(bakery)), "cat"] = "bakery"
dfcat = dfcat.fillna("tbd")
dfcat["internal"] = dfcat["cat"] + "_" + dfcat["concatted"]
dfcat["col"] = dfcat.internal.str.split("_")
dfcat = dfcat.explode("col").drop("internal", 1)

dfcat = dfcat.pivot(index="concatted", columns="col", values="truth").reset_index().fillna(False)
df.merge(dfcat, on=["concatted"])

您可以简单地执行以下操作:

# check if contains fruit, no need for loop
df['fruit'] = df['concatted'].str.contains('|'.join(fruit), regex=True)

# check proportion of "high" in each group
df['prop'] = df.groupby('date')['status'].transform(lambda x: (x=='high').sum() / len(x))

print(df)

    concatted  score        date status  fruit  prop
0  apple_bana  0.500  02/20/2010   high   True   1.0
1       apple  0.400  02/10/2010   high   True   1.0
2      banana  0.530  01/12/2010   high   True   1.0
3        kiwi  0.532  03/03/2010    low  False   0.0
4        cake  0.634  03/05/2010    low   True   0.0

相关问题 更多 >