我想计算某些字符串在多列中的出现次数,并在新列中返回总计数
所以我知道我可以使用value\u counts来计算给定列中值的总出现次数:
data['col'].value_counts(dropna=False)
结果:
[["win" TKO technical knockout] 336
[["win" UD unanimous decision] 307
[["win" KO knockout] 225
[["loss" UD unanimous decision] 97
[["loss" TKO technical knockout] 64
[["win" nan null] 53
[["draw" MD majority decision] 43
[["loss" KO knockout] 41
[["loss" MD majority decision] 35
[["loss" nan null] 32
[["loss" SD split decision] 29
[["unknown" nan null] 29
[["win" SD split decision] 27
[["draw" PTS null] 18
[["win" RTD corner retirement] 17
[["draw" SD split decision] 12
[["loss" RTD corner retirement] 11
[["win" MD majority decision] 9
[["loss" DQ disqualification] 6
[["win" PTS null] 6
[["unknown" NC null] 3
问题是我想计算[[“win”KO knockout]在每个相关列中的出现次数(相关列是col1到col20)
以下是我的数据示例:
{'col1': {0: ['["win" UD unanimous decision'],
1: ['["win" UD unanimous decision'],
2: ['["win" TKO technical knockout'],
3: ['["win" UD unanimous decision'],
4: ['["win" UD unanimous decision']},
'col2': {0: ['["win" TKO technical knockout'],
1: ['["win" TKO technical knockout'],
2: ['["win" TKO technical knockout'],
3: ['["win" UD unanimous decision'],
4: ['["win" UD unanimous decision']},
'col3': {0: ['["win" TKO technical knockout'],
1: ['["win" KO knockout'],
2: ['["win" TKO technical knockout'],
3: ['["win" TKO technical knockout'],
4: ['["win" UD unanimous decision']},
'col4': {0: ['["win" UD unanimous decision'],
1: ['["win" UD unanimous decision'],
2: ['["win" KO knockout'],
3: ['["win" TKO technical knockout'],
4: ['["win" UD unanimous decision']}}
在这种情况下,所需的输出是:
win UD win TKO win KO
0 2 2 0
1 2 1 1
2 0 3 1
3 2 2 0
4 4 0 0
更新:
我也尝试过使用大小和groupby:
#list of column names
col_outcome = ['col'+str(i) for i in range(1,11)]
data.groupby(col_outcome).size()
但是,这将返回以下错误消息:
TypeError: unhashable type: 'list'
IIUC,让我们用
stack
将“wide”数据帧重塑为“long”,然后做一些数据字符串清理,然后使用regexextract
和replace
,接下来groupby
和apply
value_count
,最后使用unstack
重塑结果:输出:
相关问题 更多 >
编程相关推荐