如何在python中使用循环计算bigram

网友

1楼 · 编辑于 2024-06-25 05:47:08

你想数一数相邻两个单词的数目吗？把它们做成元组。在

text = [{'ideology':3.4, 'ID':'50555', 'reviews':'Politician from CA-21, very liberal and aggressive'}]
Count = {}
for l in text:
   words = l['reviews'].split()
   for i in range(len(words)-1):
        if not (words[i],words[i+1]) in Count:
                Count[(words[i],words[i+1])] = 0
        Count[(words[i],words[i+1])] += 1

print Count

结果：

{（'and'，'aggressive'）：1，（'from'，'CA-21，'）：1，（'political'，'from'）：1，（'CA-21，'，'very'）：1，（'very'，'freegative'）：1，（'freedom'，'and'）：1}

网友

2楼 · 编辑于 2024-06-25 05:47:08

如果我正确理解您的问题，下面的代码将解决您的问题。在

 Count = dict()
    for l in text:
        words = l['reviews'].split()
        for i in range(0,len(words) -1):
            bigram  = " ".join(words[i:i+2] )
            if not bigram  in Count:
                Count[bigram] = 1;
            else:
                Count[bigram] = Count[bigram] + 1

计数为：

^{pr2}$

在编辑：如果你想用key作为元组只需改变连接线。python dict也散列元组。在

网友

3楼 · 编辑于 2024-06-25 05:47:08

有一种方法可以计算标准库中的对象，称为^{}。另外，在^{}的帮助下，bigram计数器脚本可以如下所示：

from collections import Counter, defaultdict
from itertools import izip, tee

#function from 'recipes section' in standard documentation itertools page
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

text = [{'ideology': 3.4, 'ID': '50555',
 'reviews': 'Politician from CA-21, very liberal and aggressive'},
 {'ideology': 1.5, 'ID': '10223',
 'reviews': 'Retired politician'} ]

c = Counter()
for l in text:
   c.update(pairwise(l['reviews'].split()))

print c.items()

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在python中使用循环计算bigram

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >