Python“撤消”文本换行

2条回答

网友

1楼 · 编辑于 2024-10-02 08:23:11

import re


with open('C:\Users\Paul\BROWN_A1.txt', 'rU') as truefile:
    true_corpus = truefile.read()

true_tokens = true_corpus.split(' ')

with open('C:\Users\Paul\Desktop\Comp_Ling_Research_1\BROWN_A1_hypenated.txt', 'rU') as myfile:

my_corpus = myfile.read()

my_tokens = my_corpus.split(' ')

网友

2楼 · 编辑于 2024-10-02 08:23:11

第一步是保留一组有效的单词，如果您的断字单词在有效单词集中，则取消断字。Ubuntu在/usr/share/dict/american english上有一个有效单词列表。过于简单的版本可能看起来像：

valid_words = set(line.strip() for line in open(valid_words_file))

output = []
for word in open(new_file).read().replace('-\n', '').replace('\n', ' ').split():
    if '-' in word and word.replace('-', '') in valid_words:
        output.append(word.replace('-', ''))
    else:
        output.append(word)

你需要处理标点符号、大写字母等等，但这就是问题所在。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python“撤消”文本换行

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >