计算字母组合的出现次数

2024-07-02 13:26:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在计算文本文件中每个字母组合的出现次数:

'aa'、'ab'、…'zy'、'zz'

我已经能够很容易地用集合。计数器我只是想知道是否有类似的方法来处理2个字母的组合。在

谢谢


Tags: 方法ab字母计数器次数aa文本文件地用
3条回答

如果你只需要字母,你可以过滤非字母,你不需要在内存中存储任何额外的数据,你所要做的就是链接字符并每次跟踪前一个字符:

from collections import Counter
from itertools import chain

with open("in.txt") as f:
    prev = f.read(1)
    c = Counter()
    for ch in filter(str.isalpha, chain.from_iterable(f)):
        c[prev + ch] += 1
        prev = ch
print(c)

如果您想要所有字符,只需删除过滤器:

^{pr2}$
import collections, itertools

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return zip(a, b)

text = "I'm trying to count the number of occurrences of each letter combination in a text file"

counter = collections.Counter(pairwise(text))

“诀窍”是使用生成器(如我从python文档复制的生成器)来访问字母组合。它可以自然地扩展到三个或更多个字母。在

如果要忽略空白,请先将输入标记化。在

from collections import Counter

txt = "Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?"

txt1 = txt[:-1]
txt2 = txt[1:]
print (Counter([t1+t2 for t1, t2 in zip(txt1,txt2)]))

相关问题 更多 >