Python的E类文本分类

2024-09-27 17:53:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在用Python做一个关于文本分类的基本项目。 我正在使用nltk,我已经导入了它的棕色语料库。 当我试图将一组分类为“阳性”而另一组分类为“阴性”时,我得到了一个非类型错误。 这是我目前掌握的代码:

from nltk.corpus import brown
brown.fileids()

categories = brown.categories()
categories

news_text = brown.sents(categories='news')
editorial_text= brown.sents(categories='editorial')
romance_text= brown.sents(categories='romance')
target_text=news_text + editorial_text

total_text=news_text + editorial_text + romance_text

data=[]

for text in total_text:
    if text in target_text:
        label= "pos"
    else:
        label = "neg"

data.extend( [(label, text) for text in total_text] )

下面是我收到的错误消息:

AttributeError                            Traceback (most recent call last)
<ipython-input-9-f8709bb455fe> in <module>()
      1 data=[]
      2 
----> 3 for text in total_text:
      4     if text in target_text:
      5         label= "pos"

/usr/local/lib/python3.5/dist-packages/nltk/collections.py in iterate_from(self, start_index)
    330                         'inconsistent list value (num elts)')
    331 
--> 332             for value in sublist[max(0, start_index-index):]:
    333                 yield value
    334 

/usr/local/lib/python3.5/dist-packages/nltk/collections.py in iterate_from(self, start_index)
    330                         'inconsistent list value (num elts)')
    331 
--> 332             for value in sublist[max(0, start_index-index):]:
    333                 yield value
    334 

/usr/local/lib/python3.5/dist-packages/nltk/corpus/reader/util.py in iterate_from(self, start_tok)
    400 
    401             # Get everything we can from this piece.
--> 402             for tok in piece.iterate_from(max(0, start_tok-offset)):
    403                 yield tok
    404 

/usr/local/lib/python3.5/dist-packages/nltk/corpus/reader/util.py in iterate_from(self, start_tok)
    291         while filepos < self._eofpos:
    292             # Read the next block.
--> 293             self._stream.seek(filepos)
    294             self._current_toknum = toknum
    295             self._current_blocknum = block_index

AttributeError: 'NoneType' object has no attribute 'seek'

有什么我能解决的吗?你知道吗


Tags: textinfromselfforindexvaluestart

热门问题