在2D numpy数组中查找最常见的字符串

2条回答

网友

1楼 · 编辑于 2024-09-28 19:20:22

您可以使用numpy和collections获得所有值的计数。您的问题不清楚2D列表中的数值实际上是数字还是字符串，但只要数值在第一位，单词在第二位，这两种情况都适用：

import numpy
from collections import Counter

input1 = [['0.001251993149471442', 'herfst'], ['0.002232327408019874', 'herfst'], ['0.002232327408019874', 'herfst'], ['0.002232327408019874', 'winter'], ['0.002232327408019874', 'winter']]
input2 = [[0.001251993149471442, 'herfst'], [0.002232327408019874, 'herfst'], [0.002232327408019874, 'herfst'], [0.002232327408019874, 'winter'], [0.002232327408019874, 'winter']]

def count(input):
  oneDim = list(numpy.ndarray.flatten(numpy.array(input))) # flatten the list
  del oneDim[0::2]                                         # remove the 'numbers' (i.e. elements at even indices)
  counts = Counter(oneDim)                                 # get a count of all unique elements
  maxString = counts.most_common(1)[0]                     # find the most common one
  print(maxString)

count(input1)
count(input2)

如果还想在计数中包括数字，只需跳过del oneDim[0::2]行

网友

2楼 · 编辑于 2024-09-28 19:20:22

不幸的是，mode()方法只存在于熊猫中，而不存在于Numpy中，因此，第一步是展平阵列（arr）并将其转换为泛美的系列：

s = pd.Series(arr.flatten())

然后，如果您想找到最常见的字符串（请注意Numpy 阵列具有相同类型的所有元素），这是最直观的解决方案是执行：

s.mode()[0]

（s.mode（） )

结果是：

'0.002232327408019874'

但是如果你想省去可以转换成数字的字符串，你需要一种不同的方法

不幸的是，您不能使用s.str.isnumeric（），因为它可以找到仅由数字组成的字符串，但“数字”字符串包含还有点

因此，您必须使用str.match和缩小系列（s）然后调用模式：

s[~s.str.match('^[+-]?(?:\d|\d+\.\d*|\d*\.\d+)$')].mode()[0]

这一次的结果是：

'herfst'

相关问题更多 >

编程相关推荐

热门问题

热门文章

在2D numpy数组中查找最常见的字符串

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >