如何根据名称将文本文件中的单词添加到词典中？

网友

1楼 · 编辑于 2024-10-04 01:23:05

下面是一个简单的实现：

from collections import defaultdict

import nltk

def is_dialogue(line):
    # Add more rules to check if the 
    # line is a dialogue or not
    if len(line) > 0 and line.find('[') == -1 and line.find(']') == -1:
        return True

def get_dialogues(filename, people_list):
    dialogues = defaultdict(list)
    people_list = map(lambda x: x+':', people_list)
    current_person = None
    with open(filename) as fin:
        for line in fin:
            current_line = line.strip().replace('\n','')
            if  current_line in people_list:
                current_person = current_line
            if (current_person is not None) and (current_line != current_person) and is_dialogue(current_line):
                dialogues[current_person].append(current_line)
    return dialogues

def get_word_counts(dialogues):
    word_counts = defaultdict(dict)
    for (person, dialogue_list) in dialogues.items():
        word_count = defaultdict(int)
        for dialogue in dialogue_list:
            for word in nltk.tokenize.word_tokenize(dialogue):
                word_count[word] += 1
        word_counts[person] = word_count
    return word_counts

if __name__ == '__main__':
    dialogues = get_dialogues('script.txt', ['Sampson', 'Gregory', 'Abraham'])
    word_counts = get_word_counts(dialogues)
    print word_counts

网友

2楼 · 编辑于 2024-10-04 01:23:05

import collections
import string
c = collections.defaultdict(collections.Counter)
speaker = None

with open('/tmp/spam.txt') as f:
  for line in f:
    if not line.strip():
      # we're on an empty line, the last guy has finished blabbing
      speaker = None
      continue
    if line.count(' ') == 0 and line.strip().endswith(':'):
      # a new guy is talking now, you might want to refine this event
      speaker = line.strip()[:-1]
      continue
    c[speaker].update(x.strip(string.punctuation).lower() for x in line.split())

输出示例：

^{pr2}$

网友

3楼 · 编辑于 2024-10-04 01:23:05

你不想马上去掉标点符号。一个新行前面的冒号告诉你一个人的引语开始和结束的位置。这一点很重要，这样你就可以知道在哪本词典中要将引用的单词追加到哪个词典中。你可能会需要一些if-else，它会根据当前说话的人而附加到不同的字典中。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何根据名称将文本文件中的单词添加到词典中？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >