用python填充字典

网友

1楼 · 编辑于 2024-09-27 23:21:14

假设您有Hamlet，并且您想要计算唯一的单词。在

您可以：

# the tools we need, read a url and regex library 
import urllib2
import re

# a dict -- similar to Perl hash
words={}

# read the text at that url
response = urllib2.urlopen('http://pastebin.com/raw.php?i=7p3uycAz')
hamlet = response.read()

# split on whitespace, remove trailing punctuation, and count each unique word
for word in hamlet.split():
    word=re.sub(r'\W+$', r'', word)
    if word.strip(): 
        words[word]=words.setdefault(word, 0) +1

如果要打印从最常见到最少的单词：

^{pr2}$

印刷品：

the 988
and 702
of 628
to 610
I 541
you 495
a 452
my 441
in 399
HAMLET 385
it 360
is 313
...

如果需要Dict的嵌套Dict（如Perl示例所示），可以执行以下操作：

# think of these strings like files; the letters like words
str1='abcdefaaa'
str2='abefdd'
str3='defeee'

letters={}

for fn, st in (('string 1', str1), ('string 2', str2) , ('string 3', str3)):
    letters[fn]={}
    for c in st:
        letters[fn][c]=letters[fn].setdefault(c, 0)
        letters[fn][c]+=1

print letters     
# {'string 3': {'e': 4, 'd': 1, 'f': 1}, 
   'string 1': {'a': 4, 'c': 1, 'b': 1, 'e': 1, 'd': 1, 'f': 1}, 
   'string 2': {'a': 1, 'b': 1, 'e': 1, 'd': 2, 'f': 1}}

网友

2楼 · 编辑于 2024-09-27 23:21:14

我建议集合。计数器，如果您使用的是Python 2.7或更高版本：

import collections

counter = collections.Counter()

for line in fh:
    arr = line.split()
    for word in arr:
        key = filename + word  #creates a unique identifier for each word count
        counter.update((key,))

您可以这样查看计数：

^{pr2}$

网友

3楼 · 编辑于 2024-09-27 23:21:14

使用Counter并使用元组（文件名、单词）作为键值，您可能可以不受影响，例如：

from collections import Counter
from itertools import chain

word_counts = Counter()
for filename in ['your', 'file names', 'here']:
    with open(filename) as fin:
        words = chain.from_iterable(line.split() for line in fin)
        word_counts.update((filename, word) for word in words)

但是，您还可以根据文件名创建一个初始字典，其中有一个Counter，然后进行更新，以便您可以访问一个“散列”，就像文件名作为键一样，然后是单词计数，例如：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

用python填充字典

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >