Pandas计算并从列中获取字符串值的唯一出现次数

0 ['Overgrow', 'Chlorophyll'] 1 ['Overgrow', 'Chlorophyll'] 2 ['Overgrow', 'Chlorophyll'] 3 ['Blaze', 'Solar Power'] 4 ['Blaze', 'Solar Power'] 5 ['Blaze', 'Solar Power'] 6 ['Torrent', 'Rain Dish'] 7 ['Torrent', 'Rain Dish'] 8 ['Torrent', 'Rain Dish'] 9 ['Shield Dust', 'Run Away'] 10 ['Shed Skin'] 11 ['Compoundeyes', 'Tinted Lens'] 12 ['Shield Dust', 'Run Away'] 13 ['Shed Skin'] 14 ['Swarm', 'Sniper'] 15 ['Keen Eye', 'Tangled Feet', 'Big Pecks'] 16 ['Keen Eye', 'Tangled Feet', 'Big Pecks'] 17 ['Keen Eye', 'Tangled Feet', 'Big Pecks']

2条回答

网友

1楼 · 编辑于 2024-05-12 21:21:59

使用value_counts

In [1845]: counts = pd.Series(np.concatenate(df_pokemon.abilities)).value_counts()

In [1846]: counts
Out[1846]:
Rain Dish       3
Keen Eye        3
Chlorophyll     3
Blaze           3
Solar Power     3
Overgrow        3
Big Pecks       3
Tangled Feet    3
Torrent         3
Shield Dust     2
Shed Skin       2
Run Away        2
Compoundeyes    1
Swarm           1
Tinted Lens     1
Sniper          1
dtype: int64

为了独特的价值，你可以

^{pr2}$

或者

In [1849]: np.unique(np.concatenate(df_pokemon.abilities))
Out[1849]:
array(['Big Pecks', 'Blaze', 'Chlorophyll', 'Compoundeyes', 'Keen Eye',
       'Overgrow', 'Rain Dish', 'Run Away', 'Shed Skin', 'Shield Dust',
       'Sniper', 'Solar Power', 'Swarm', 'Tangled Feet', 'Tinted Lens',
       'Torrent'],
      dtype='|S12')

注意-如Jon's comments所指，如果type(df_pokemon.abilities[0])不是{}，那么首先转换为list

import ast
df_pokemon.abilities = df_pokemon.abilities.map(ast.literal_eval)

细节

In [1842]: df_pokemon
Out[1842]:
                              abilities
0               [Overgrow, Chlorophyll]
1               [Overgrow, Chlorophyll]
2               [Overgrow, Chlorophyll]
3                  [Blaze, Solar Power]
4                  [Blaze, Solar Power]
5                  [Blaze, Solar Power]
6                  [Torrent, Rain Dish]
7                  [Torrent, Rain Dish]
8                  [Torrent, Rain Dish]
9               [Shield Dust, Run Away]
10                          [Shed Skin]
11          [Compoundeyes, Tinted Lens]
12              [Shield Dust, Run Away]
13                          [Shed Skin]
14                      [Swarm, Sniper]
15  [Keen Eye, Tangled Feet, Big Pecks]
16  [Keen Eye, Tangled Feet, Big Pecks]
17  [Keen Eye, Tangled Feet, Big Pecks]

In [1843]: df_pokemon.dtypes
Out[1843]:
abilities    object
dtype: object

In [1844]: type(df_pokemon.abilities[0])
Out[1844]: list

网友

2楼 · 编辑于 2024-05-12 21:21:59

因为这些值是字符串，所以可以使用regex和split将它们转换为list，然后使用itertools，就像注释中提到的@JonClements那样进行计数，即

from collections import Counter
count  = pd.Series(df['abilities'].str.replace('[\[\]\']','').str.split(',').map(Counter).sum())

输出：

^{pr2}$

如果只列出唯一值，则count[count==1].index.tolist()

['Sniper', 'Tinted Lens', 'Compoundeyes', 'Swarm']

那就把索引列出来吧

count.index.tolist()

我想要什么？

示例：

相关问题更多 >

编程相关推荐

热门问题

热门文章