用NLTK实现对象标准化

2024-07-03 06:06:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我对NLP和Python都是新手。 我试图用对象标准化来代替缩写词的全部含义。我在网上找到了代码,修改后在维基百科上进行了测试。但所有的代码只是打印出原始文本。有人能帮助一个需要帮助的新手吗?你知道吗

代码如下:

import nltk

lookup_dict = {'EC': 'European Commission', 'EU': 'European Union', "ECSC": "European Coal and Steel Commuinty",
               "EEC": "European Economic Community"}


def _lookup_words(input_text):
    words = input_text.split()
    new_words = []
    for word in words:
        if word.lower() in lookup_dict:
            word = lookup_dict[word.lower()]
        new_words.append(word)
        new_text = " ".join(new_words)


    print(new_text)
    return new_text


_lookup_words(
    "The High Authority was the supranational administrative executive of the new European Coal and Steel Community ECSC. It took office first on 10 August 1952 in Luxembourg. In 1958, the Treaties of Rome had established two new communities alongside the ECSC: the eec and the European Atomic Energy Community (Euratom). However their executives were called Commissions rather than High Authorities")

谢谢您的帮助!你知道吗


Tags: andthe代码textincommunitynewlookup
1条回答
网友
1楼 · 发布于 2024-07-03 06:06:27

在您的例子中,lookup dict在您输入的句子中包含EC和ECSC的缩写。调用split会根据空格分割输入。但是你的句子有ECSC.ECSC:两个词,也就是说,这些是分裂后获得的标记,而不是ECSC,因此你无法映射输入。我建议你取消时间,然后再运行一次。你知道吗

相关问题 更多 >