将字符串拆分为连续计数？

2条回答

网友

1楼 · 编辑于 2024-06-01 09:58:53

使用Counter表示字符串中每个不同字母的计数，而不考虑其位置：

>>> s="aaabbbbccdaeeee"
>>> from collections import Counter
>>> Counter(s)
Counter({'a': 4, 'b': 4, 'e': 4, 'c': 2, 'd': 1})

如果字符串中的位置有意义，可以使用groupby：

from itertools import groupby
li=[]
for k, l in groupby(s):
    li.append((k, len(list(l))))

print li

印刷品：

[('a', 3), ('b', 4), ('c', 2), ('d', 1), ('a', 1), ('e', 4)]

可以简化为列表理解：

[(k,len(list(l))) for k, l in groupby(s)]

您甚至可以使用正则表达式：

>>> [(m.group(0)[0], len(m.group(0))) for m in re.finditer(r'((\w)\2*)', s)] 
[('a', 3), ('b', 4), ('c', 2), ('d', 1), ('a', 1), ('e', 4)]

网友

2楼 · 编辑于 2024-06-01 09:58:53

有许多不同的方法来解决这个问题。@dawg已经发布了最佳解决方案，但是如果出于某种原因你不能使用Counter()（可能是工作面试或学校作业），那么你实际上可以用几种方法解决问题。你知道吗

from collections import Counter, defaultdict

def counter_counts(s):
    """ Preferred method using Counter()


    Arguments:
        s {string}   [string to have each character counted]

    Returns:
        [dict]   [dictionary of counts of each char]
    """

    return Counter(s)

def default_counts(s):
    """ Alternative solution using defaultdict


    Arguments:
        s {string}   [string to have each character counted]

    Returns:
        [dict]   [dictionary of counts of each char]
    """

    counts = defaultdict(int)  # each key is initalized to 0
    for char in s:
        counts[char] += 1  # increment the count of each character by 1

    return counts

def vanilla_counts_1(s):
    """ Alternative solution using a vanilla dicitonary


    Arguments:
        s {string}   [string to have each character counted]

    Returns:
        [dict]   [dictionary of counts of each char]
    """

    counts = {}
    for char in s:
        # we have to manually check that each value is in the dictionary before attempting to increment it
        if char in counts:
            counts[char] += 1
        else:
            counts[char] = 1

    return counts

def vanilla_counts_2(s):
    """ Alternative solution using a vanilla dicitonary
    This version uses the .get() method to increment instead of checking if a key already exists


    Arguments:
        s {string}   [string to have each character counted]

    Returns:
        [dict]   [dictionary of counts of each char]
    """

    counts = {}
    for char in s:
         # the second argument in .get() is the default value if we dont find the key
        counts[char] = counts.get(char, 0) + 1 

    return counts

为了好玩，让我们看看每个方法是如何执行的。你知道吗

对于s = "aaabbbbccdaeeee"和10000次运行：

Counter: 0.0330204963684082s
defaultdict: 0.01565241813659668s
vanilla 1: 0.01562952995300293s
vanilla 2: 0.015581130981445312s

（实际结果相当令人惊讶）

现在让我们测试一下，如果我们将字符串设置为《创世纪》的整个纯文本版本并运行1000次，会发生什么：

Counter: 8.500739336013794s
defaultdict: 14.721554040908813s
vanilla 1: 18.089043855667114s
vanilla 2: 27.01840090751648s

看起来创建Counter()对象的开销变得不那么重要了！你知道吗

（这些不是很科学的测试，但很有趣）。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

将字符串拆分为连续计数？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >