计算值出现的百分比

2024-06-28 19:54:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用此数据帧:

Car make | Driver's Gender
Ford     | m
GMC      | m
GMC      | f
Ferrari  | f

我想计算每个品牌的男性司机的百分比

Car make  | Male drivers
Ford      | 100
GMC       | 50
Ferrari   | 0

Tags: 数据makedrivergendercarmale百分比品牌
2条回答

比较m的第二列,然后聚合mean

df1 = (df["Driver's Gender"].eq('m')
       .groupby(df['Car make'], sort=False)
       .mean()
       .mul(100)
      .reset_index(name='Male drivers'))
print (df1)
  Car make  Male drivers
0     Ford         100.0
1      GMC          50.0
2  Ferrari           0.0

使用^{}normalize参数的另一个想法:

df2 = pd.crosstab(df['Car make'], df["Driver's Gender"], normalize=0).mul(100)
print (df2)
Driver's Gender      f      m
Car make                     
Ferrari          100.0    0.0
Ford               0.0  100.0
GMC               50.0   50.0

以下是一些方法:

通过将“m”转换为100,将“f”转换为0并取平均值,快速且肮脏

df["Male drivers"] = df["Driver's gender"].apply(lambda x: 100 if x=="m" else 0)
male_freq = df.groupby("Car make").mean(numeric_only=True)

使用groupby和手动频率计算

male_freq = df.groupby("Car make").agg(lambda x: 100*sum(x == "m") / len(x))

使用groupby和value_计数

def get_male_frequency(series):
    val_counts = series.value_counts(normalize=True)
    return 100 * val_counts.get("m", 0)

male_freq = df.groupby("Car make").agg(get_male_frequency)

或更通用的版本:

def get_frequency(value_of_interest):
    def _get_frequency(series):
        val_counts = series.value_counts(normalize=True)
        return 100 * val_counts.get(value_of_interest, 0)
return _get_frequency

x = df.groupby("Car make").agg(get_frequency("m"))

它们都输出以下内容:

          Driver's gender
Car make                 
Ferrari               0.0
Ford                100.0
GMC                  50.0

相关问题 更多 >