字符连续出现的概率

2024-06-26 17:59:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文本文件如下。在

A,B,C,D,E
A,B,C
A,B,C,E
C,D,E
C,D,E,B,A

我需要找出连续出现字符的概率。在这种情况下,B发生在A之后的概率

^{pr2}$

所以可能性是

3/4 = 0.75

像wise一样,我需要计算所有成对概率。在

A->B
B->A
A->C
C->A
A->D ...etc.

我不知道如何开始实施这件事?使用熊猫DataFrmae也可以。有什么帮助吗?在


Tags: etc情况可能性概率字符文本文件wisepr2
1条回答
网友
1楼 · 发布于 2024-06-26 17:59:48

暴力:

from collections import defaultdict

data = [['A','B','C','D','E'],
        ['A','B','C'],
        ['A','B','C','E'],
        ['C','D','E'],
        ['C','D','E','B','A']]
characters = [i for j in data for i in j]
counts = {}
combinations = defaultdict(int)
for character in set(characters):
    counts[character] = characters.count(character)
    for character2 in set(characters):
        for entry in data:
            combination = [character, character2]
            if "".join(combination) in "".join(entry):
                combinations[tuple(combination)] += 1
probability = {i: combinations[i]/float(counts[i[0]]) for i in combinations}
probability

{('A', 'B'): 0.75,
  ('B', 'A'): 0.25,
  ('B', 'C'): 0.75,
  ('C', 'D'): 0.6,
  ('C', 'E'): 0.2,
  ('D', 'E'): 1.0,
  ('E', 'B'): 0.25}

相关问题 更多 >