<p>数据似乎没有显示任何用于真正测试计算器的功能。假设您希望所有内容都回到原始数据帧中<code>transform()</code>就是您的朋友
同样改变的方式列表用于执行reg expr<code>contains()</code>检查以消除循环</p>
<pre><code>df = pd.DataFrame({"concatted":["apple_bana","apple","banana","kiwi","cake"],"score":[0.5,0.4,0.53,0.532,0.634],"date":["2010-02-19T16:00:00.000Z","2010-02-09T16:00:00.000Z","2010-01-11T16:00:00.000Z","2010-03-02T16:00:00.000Z","2010-03-04T16:00:00.000Z"],"status":["high","high","high","low","low"],"fruit":[False,False,False,False,False]})
fruits = ['apple', 'banana', 'orange']
df["fruit"] = df["concatted"].str.contains("|".join(fruits))
df["highcalc"] = df.groupby('date')['status'].transform(lambda x: (x=='high').sum())
df["datecount"] = df.groupby('date')["date"].transform("count")
df["finalcalc"] = df.apply(lambda r: r["highcalc"]/r["datecount"], axis=1 )
print(df.to_string(index=False))
df
</code></pre>
<p><strong>输出</strong></p>
<pre><code> concatted score date status fruit highcalc datecount finalcalc
apple_bana 0.500 2010-02-19T16:00:00.000Z high True 1 1 1.0
apple 0.400 2010-02-09T16:00:00.000Z high True 1 1 1.0
banana 0.530 2010-01-11T16:00:00.000Z high True 1 1 1.0
kiwi 0.532 2010-03-02T16:00:00.000Z low False 0 1 0.0
cake 0.634 2010-03-04T16:00:00.000Z low False 0 1 0.0
</code></pre>
<p><strong>补充列以对行进行分类</strong></p>
<pre><code>df = pd.DataFrame({"concatted":["apple_bana","apple","banana","kiwi","cake"],"score":[0.5,0.4,0.53,0.532,0.634],"date":["2010-02-19T16:00:00.000Z","2010-02-09T16:00:00.000Z","2010-01-11T16:00:00.000Z","2010-03-02T16:00:00.000Z","2010-03-04T16:00:00.000Z"],"status":["high","high","high","low","low"]})
df["highcalc"] = df.groupby('date')['status'].transform(lambda x: (x=='high').sum())
df["datecount"] = df.groupby('date')["date"].transform("count")
df["finalcalc"] = df.apply(lambda r: r["highcalc"]/r["datecount"], axis=1 )
dfcat = pd.DataFrame({"concatted":df["concatted"].unique(), "cat":np.NaN, "truth":True})
fruits = ['apple', 'banana', 'orange']
bakery = ["cake"]
dfcat.loc[dfcat["cat"].isna() & dfcat["concatted"].str.contains("|".join(fruits)), "cat"] = "fruit"
dfcat.loc[dfcat["cat"].isna() & dfcat["concatted"].str.contains("|".join(bakery)), "cat"] = "bakery"
dfcat = dfcat.fillna("tbd")
dfcat["internal"] = dfcat["cat"] + "_" + dfcat["concatted"]
dfcat["col"] = dfcat.internal.str.split("_")
dfcat = dfcat.explode("col").drop("internal", 1)
dfcat = dfcat.pivot(index="concatted", columns="col", values="truth").reset_index().fillna(False)
df.merge(dfcat, on=["concatted"])
</code></pre>