如何在基于四分位数划分的for循环中添加列名?

2024-07-03 06:42:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我的数据框架如下:

data = {
    'Name': ['tom', 'nick', 'krish', 'jack', 'ram', 'antony', 'nicols',
             'lisa', 'sasha', 'jynx', 'dani'],
    'Cricket': [8, 9, 11, 6, 12, 15, 14, 12, 11, 13, 7],
    'Football': [1, 3, 1, 3, 5, 6, 2, 0, 5, 4, 6],
    'Hockey': [1, 0, 1, 0, 5, 6, 12, 12, 14, 13, 10],
    'Soccer': [5, 6, 2, 9, 5, 5, 6, 7, 6, 11, 12],
    'Kabadi': [9, 4, 5, 3, 3, 4, 5, 6, 6, 6, 7]
}
df = pd.DataFrame(data)
df

情况如下:

    Name    Cricket Football    Hockey  Soccer  Kabadi
0   tom     8       1           1       5       9
1   nick    9       3           0       6       4
2   krish   11      1           1       2       5
3   jack    6       3           0       9       3
4   ram     12      5           5       5       3
5   antony  15      6           6       5       4
6   nicols  14      2           12      6       5
7   lisa    12      0           12      7       6
8   sasha   11      5           14      6       6
9   jynx    13      4           13      11      6
10  dani    7       6           10      12      7

我想根据for循环中的四分位数截止值,为df中的每一列添加一个新的组列

df['Cricket'].quantile([.1, .25, .5, .75])

0.10     7.0
0.25     8.5
0.50    11.0
0.75    12.5
Name: Cricket, dtype: float64

#低、中、高分组

conditions = [
    (df['Cricket'] >= 12.5),
    (df['Cricket'] < 12.5) & (df['Cricket'] >= 8.5),
    (df['Cricket'] < 8.5)
    ]
values = ['High','Moderate', 'Low']
df['CricketGroup'] = np.select(conditions, values)
df.head()

数据如下:

    Name    Cricket Football    Hockey  Soccer  Kabadi  CricketGroup
0   tom     8       1           1       5       9       Low
1   nick    9       3           0       6       4       Moderate
2   krish   11      1           1       2       5       Moderate
3   jack    6       3           0       9       3       Low
4   ram     12      5           5       5       3       Moderate

在需要根据四分位数截止值为FootballGroupHockeyGroupKabadiGroup添加列的for循环中,如何执行此操作


Tags: 数据namedfnickramlowjacktom
2条回答

如果可能,请使用^{}

f1 = lambda x: pd.qcut(x, [.1, .25, .5, .75], 
                       labels=['Low','Moderate', 'High']).fillna('Low')

应通过比较Series中的值来更改函数:

def f1(x):
    s = x.quantile([.1, .25, .5, .75])
    conditions = [
    (x >= s[0.75]),
    (x < s[0.75]) & (x >= s[0.5]),
    (x < s[0.5])
    ]
    values = ['High','Moderate', 'Low']
    return np.select(conditions, values)

#apply only for columns from list
cols = ['Cricket','Football','Hockey','Soccer','Kabadi']

df1 = df[cols].apply(f1).add_suffix('Group')

df = df.join(df1)
for i in 'Cricket  Football  Hockey  Soccer  Kabadi'.split():

    #i='Cricket'
    q=df[i].quantile([.1, .25, .5, .75])
    
    conditions = [
        (df[i] >= q[0.75]),
        (df[i] < q[0.75]) & (df[i] >= q[0.25]),
        (df[i] < q[0.25])
        ]
    values = ['High','Moderate', 'Low']
    grname=i+'Group'
    df[grname] = np.select(conditions, values)
    df[[i,grname]] #check

相关问题 更多 >