如何按组计算两列值的比率?

2024-09-30 20:36:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个航班列表作为一个数据框,如下所示:

airline        status          number    ...

Aer Lingus     some error A        14    ...
Aeroflot       success             47    ...
Air Canada     success              2    ...
Air Dolomiti   success              2    ...
Air Europa     some error B         4    ...
Air France     success             40    ...
Alitalia       some error A        10    ...
....

每次飞行成功或出现错误时,我都会调用一些api。你知道吗

我想要的是每个航空公司的成功率。所以我知道我可以用groupby计算“airline”和“status”列中的值的出现次数,但是我无法计算值的比率。你知道吗

def calculate_quote_success_ratios(flights):

    success_ratio_per_airline = flights.groupby(['airline', 'status']).count()
    # TODO: Include ratio with failures!

    return success_ratio_per_airline

预期的输出应该是这样的:

airline        success_ratio

Aer Lingus     0.72
Aeroflot       0.845
Air Canada     0.935
Air Dolomiti   0.5
Air Europa     ...
....

编辑:很明显,df['numbers']列不相关。所以我只想计算一下每一家航空公司的“成功”与各种错误的百分比。你知道吗


Tags: status错误errorsomeairsuccessaerratio
3条回答

可以将^{}normalize='index'一起使用:

# data from @Chris

res = pd.crosstab(df['airline'], df['status'], values=df['number'],
                  aggfunc='count', normalize='index')

print(res[['success']])

status       success
airline             
AerLingus        0.5
Aeroflot         1.0
AirCanada        1.0
AirDolomiti      1.0
AirEuropa        0.0
AirFrance        0.5
Alitalia         0.5

我不得不修改您的示例以使其工作,但您只需执行df.groupby(['airline', 'status']).sum() / df.groupby(['airline']).sum()

原始数据框:

airline           status    number
0   Aer Lingus  some error A    14
1   Aeroflot    success 47
2   Air Canada  success 2
3   Air Dolomiti    success 2
4   Air Europa  some error B    4
5   Air France  success 40
6   Alitalia    some error A    10
7   Alitalia    success 10
8   Air France  some error B    10
9   Aer Lingus  success 12



df.groupby(['airline', 'status']).sum() / df.groupby(['airline']).sum()

                                  number
airline         status  
Aer Lingus     some error A     0.538462
               success          0.461538
Aeroflot       success          1.000000
Air Canada      success         1.000000
Air Dolomiti    success         1.000000
Air Europa    some error B      1.000000
Air France    some error B      0.200000
              success           0.800000
Alitalia      some error A      0.500000
              success           0.500000

按组计算航班数

count_flight=df.groupby('airline').status.count()

按组计算成功次数

count_success=df[df['status']=='success'].groupby('airline').status.count()

pandas div返回前两个数据帧的按组比率。你知道吗

count_success.div(count_flight).fillna(0)

相关问题 更多 >