加载带有nltk.data.load的english.pickle失败

2024-09-25 12:22:00 发布

您现在位置:Python中文网/ 问答频道 /正文

当试图加载punkt标记器时。。。

import nltk.data
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

…引发了LookupError

> LookupError: 
>     *********************************************************************   
> Resource 'tokenizers/punkt/english.pickle' not found.  Please use the NLTK Downloader to obtain the resource: nltk.download().   Searched in:
>         - 'C:\\Users\\Martinos/nltk_data'
>         - 'C:\\nltk_data'
>         - 'D:\\nltk_data'
>         - 'E:\\nltk_data'
>         - 'E:\\Python26\\nltk_data'
>         - 'E:\\Python26\\lib\\nltk_data'
>         - 'C:\\Users\\Martinos\\AppData\\Roaming\\nltk_data'
>     **********************************************************************

Tags: the标记importdataenglishloaduserspickle
3条回答

这就是我刚才的工作:

# Do this in a separate python interpreter session, since you only have to do it once
import nltk
nltk.download('punkt')

# Do this in your ipython notebook or analysis script
from nltk.tokenize import word_tokenize

sentences = [
    "Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.",
    "Professor Plum has a green plant in his study.",
    "Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."
]

sentences_tokenized = []
for s in sentences:
    sentences_tokenized.append(word_tokenize(s))

句子标记化是一个标记列表:

[['Mr.', 'Green', 'killed', 'Colonel', 'Mustard', 'in', 'the', 'study', 'with', 'the', 'candlestick', '.', 'Mr.', 'Green', 'is', 'not', 'a', 'very', 'nice', 'fellow', '.'],
['Professor', 'Plum', 'has', 'a', 'green', 'plant', 'in', 'his', 'study', '.'],
['Miss', 'Scarlett', 'watered', 'Professor', 'Plum', "'s", 'green', 'plant', 'while', 'he', 'was', 'away', 'from', 'his', 'office', 'last', 'week', '.']]

这些句子取自例子ipython notebook accompanying the book "Mining the Social Web, 2nd Edition"

我也有同样的问题。进入python shell并键入:

>>> import nltk
>>> nltk.download()

然后出现一个安装窗口。转到“Models”选项卡,在“Identifier”列下选择“punkt”。然后单击下载,它将安装必要的文件。那就应该成功了!

import nltk
nltk.download('punkt')

from nltk import word_tokenize,sent_tokenize

使用标记器:)

相关问题 更多 >