分组依据,以python/pandas格式打印加权统计信息

2024-10-01 02:39:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框形式的表,就像

^{tb1}$

我需要一个聚合,如:

^{tb2}$

现在我要为每个功能执行以下操作:

    dftotal=df.groupby(['Tree'])["Weight"].agg(['sum']).reset_index()
    dfFruit=dfFruit.groupby(['Tree']['Fruit'])['Weight'].sum().reset_index()
    dfFrwithTotal=pd.merge(dftotal,dfFruit)
    dfFrwithTotal['Weight']=100*dfFrwithTotal['Weight']/dfFrwithTotal["sum"]
    dfFrwithTotal['joined'] = dfFrwithTotal.apply(lambda x: str(x.Fruit)+' - '+ str(x.Weight) +'%', axis=1)
    dfsummaryFr=dfFrwithTotal.groupby(['Tree']).agg({ "joined": lambda x: ','.join(x)}).reset_index()

这看起来非常难看

我对每一个特性都这样做,然后在树上合并

有一个很好的lambda表达式吗


Tags: 数据lambdatreeindexaggresetsumgroupby
2条回答

输入数据:

>>> df
   Fruit   Color  Weight   Tree
0  Apple     Red     0.1  Tree1
1  Apple   Green     0.1  Tree1
2  Apple   Green     0.9  Tree2
3  Apple  Yellow     0.1  Tree1
4   Pear   Green     1.0  Tree2

用于显示结果的格式化功能:

val2str = lambda s: ', '.join(map(lambda v: f"{v[0]}-{round(v[1], 1)}%", s.items()))

将数据帧转换为按树、果和;颜色:

sr = df.set_index(["Tree", "Fruit", "Color"]).sum(axis="columns").sort_index()
>>> sr
Tree   Fruit  Color
Tree1  Apple  Green     0.1
              Red       0.1
              Yellow    0.1
Tree2  Apple  Green     0.9
       Pear   Green     1.0
dtype: float64

生成一系列显示字符串:

fruits = \
    sr.sum(level=["Tree", "Fruit"]).mul(100).div(sr.sum(level="Tree")) \
      .unstack(level="Fruit").apply(lambda x: val2str(x[x.notna()]), axis="columns")

colors = \
    sr.sum(level=[0, "Color"]).mul(100).div(sr.sum(level="Tree")) \
      .unstack(level="Color").apply(lambda x: val2str(x[x.notna()]), axis="columns")

fruitscolors合并在一起:

>>> pd.concat({"Fruits": fruits, "Colors": colors}, axis="columns")
                        Fruits                                Colors
Tree
Tree1             Apple-100.0%  Green-33.3%, Red-33.3%, Yellow-33.3%
Tree2  Apple-47.4%, Pear-52.6%                          Green-100.0%
g_tree = df.groupby("Tree", sort=False)
weights = g_tree["Weight"].agg(list=lambda x: [pd.Series(list(x))], sum="sum")


def get_string(x):
    return " ".join(
        "{}-{:.2f}%".format(*v)
        for v in zip(
            x.values,
            weights.at[x.name, "list"][0].groupby(x.values).sum()
            / weights.at[x.name, "sum"]
            * 100,
        )
    )


out = pd.concat(
    [g_tree["Fruit"].apply(get_string), g_tree["Color"].apply(get_string)],
    axis=1,
).add_suffix("s")

print(out)

印刷品:

                         Fruits                                 Colors
Tree                                                                  
Tree1             Apple-100.00%  Red-33.33% Green-33.33% Yellow-33.33%
Tree2  Apple-47.37% Pear-52.63%                          Green-100.00%

相关问题 更多 >