如何阅读nltk.text.text文件来自nltk.book公司在Python中？

2条回答

网友

1楼 · 编辑于 2024-09-30 02:36:14

看来他们已经把它分成了代币。在

from nltk.book import text6

text6.tokens

网友

2楼 · 编辑于 2024-09-30 02:36:14

让我们深入研究代码=）

首先，nltk.book代码驻留在https://github.com/nltk/nltk/blob/develop/nltk/book.py上

如果我们仔细看一下，文本是作为nltk.Text对象加载的，例如text6来自https://github.com/nltk/nltk/blob/develop/nltk/book.py#L36：

text6 = Text(webtext.words('grail.txt'), name="Monty Python and the Holy Grail")

Text对象来自https://github.com/nltk/nltk/blob/develop/nltk/text.py#L286，您可以从http://www.nltk.org/book/ch02.html了解如何使用它

webtext是来自nltk.corpus的语料库，因此要获得nltk.book.text6的原始文本，可以直接加载webtext，例如

^{pr2}$

只有当您加载一个PlaintextCorpusReader对象时，fileids才会出现，而不是从Text对象（已处理对象）加载：

>>> type(webtext)
<class 'nltk.corpus.reader.plaintext.PlaintextCorpusReader'>
>>> for filename in webtext.fileids():
...     print(filename)
... 
firefox.txt
grail.txt
overheard.txt
pirates.txt
singles.txt
wine.txt

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何阅读nltk.text.text文件来自nltk.book公司在Python中？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >