用Python编程划分分类列

0 Brighton 1 Yokohama 2 Levin 3 Melbourne 4 Coffeyville 5 Whakatane 6 Melbourne 7 Melbourne 8 Levin 9 Ashburn 10 Te Awamutu 11 Bishkek 12 Melbourne 13 Whanganui 14 Coffeyville 15 New York 16 Brisbane 17 Greymouth 18 Brisbane 19 Chuo City 20 Accra 21 Levin 22 Waiouru 23 Brisbane 24 New York 25 Chuo City 26 Lucerne 27 Whanganui 28 Los Angeles 29 Melbourne df['city'].head(30).value_counts(ascending=False, normalize=True)*100 Melbourne 16.666667 Levin 10.000000 Brisbane 10.000000 Whanganui 6.666667 Coffeyville 6.666667 New York 6.666667 Chuo City 6.666667 Waiouru 3.333333 Greymouth 3.333333 Te Awamutu 3.333333 Bishkek 3.333333 Lucerne 3.333333 Ashburn 3.333333 Yokohama 3.333333 Whakatane 3.333333 Accra 3.333333 Brighton 3.333333 Los Angeles 3.333333

city_map = dict(df['city'].value_counts(ascending=False, normalize=True)*100) df['city_count']= df['city'].map(city_map) def count(df): if df["city_count"] > 10: return "High" elif df["city_count"] < 0: return "Medium" else: return "Low" df.apply(count, axis=1)

1条回答

网友

1楼 · 发布于 2024-09-30 01:31:53

我们可以在city上分组，得到每个城市的size。我们用len将这些值除以数据帧的长度，然后计算cumsum。最后一步是检查从哪一点超过阈值，这样我们就可以用map将布尔序列广播回您的数据帧。你知道吗

threshold = 0.7
m = df['city'].map(df.groupby('city')['city'].size().sort_values(ascending=False).div(len(df)).cumsum().le(threshold))

df['city'] = np.where(m, df['city'], 'Other')

          city
0         Other
1         Other
2         Levin
3     Melbourne
4   Coffeyville
5         Other
6     Melbourne
7     Melbourne
8         Levin
9       Ashburn
10        Other
11      Bishkek
12    Melbourne
13        Other
14  Coffeyville
15     New York
16     Brisbane
17        Other
18     Brisbane
19    Chuo City
20        Other
21        Levin
22        Other
23     Brisbane
24     New York
25    Chuo City
26        Other
27        Other
28        Other
29    Melbourne

旧方法

如果我理解正确，您需要用.cumsum计算一个累积和，并在它超过您设置的阈值时进行检查。你知道吗

然后我们使用np.where有条件地填充City name或Other。你知道吗

threshold = 80

m  = df['Normalized'].cumsum().le(threshold)

df['City'] = np.where(m, df['City'], 'Other')

            City  Normalized
0       Auckland   40.399513
1   Christchurch   13.130783
2     Wellington   12.267604
3       Hamilton    4.026242
4       Tauranga    3.867353
5      (not set)    3.540075
6        Dunedin    2.044508
7          Other    1.717975
8          Other    1.632849
9          Other    1.520342
10         Other    1.255651
11         Other    1.173878
12         Other    1.040508
13         Other    0.988166
14         Other    0.880502
15         Other    0.766877
16         Other    0.601468
17         Other    0.539067
18         Other    0.471824
19         Other    0.440903
20         Other    0.440344
21         Other    0.405884
22         Other    0.365836
23         Other    0.321131
24         Other    0.306602
25         Other    0.280524
26         Other    0.237123
27         Other    0.207878
28         Other    0.186084
29         Other    0.167085
30         Other    0.163732
31         Other    0.154977

注意：此方法假定Normalized列按降序排序。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章