在NLTK中找不到ghostscript？问题的回答

在NLTK中找不到ghostscript？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

当我尝试使用chunk模块时，我在玩NLTK <pre><code>enter import nltk as nk Sentence = "Betty Botter bought some butter, but she said the butter is bitter, I f I put it in my batter, it will make my batter bitter." tokens = nk.word_tokenize(Sentence) tagged = nk.pos_tag(tokens) entities = nk.chunk.ne_chunk(tagged) </code></pre> 当我输入时，代码运行良好 ^{pr2}$ 我收到以下错误消息： <pre><code>enter code here Out[2]: Tree('S', [Tree('PERSON', [('Betty', 'NNP')]), Tree('PERSON', [('Botter', 'NNP')]), ('bought', 'VBD'), ('some', 'DT'), ('butter', 'NN'), (',', ','), ('but', 'CC'), ('she', 'PRP'), ('said', 'VBD'), ('the', 'DT'), ('butter', 'NN'), ('is', 'VBZ'), ('bitter', 'JJ'), (',', ','), ('I', 'PRP'), ('f', 'VBP'), ('I', 'PRP'), ('put', 'VBD'), ('it', 'PRP'), ('in', 'IN'), ('my', 'PRP$'), ('batter', 'NN'), (',', ','), ('it', 'PRP'), ('will', 'MD'), ('make', 'VB'), ('my', 'PRP$'), ('batter', 'NN'), ('bitter', 'NN'), ('.', '.')])Traceback (most recent call last): File "C:\Users\QP19\AppData\Local\Continuum\Anaconda2\lib\site-packages\IPython\core\formatters.py", line 343, in __call__ return method() File "C:\Users\QP19\AppData\Local\Continuum\Anaconda2\lib\site-packages\nltk\tree.py", line 726, in _repr_png_ subprocess.call([find_binary('gs', binary_names=['gswin32c.exe', 'gswin64c.exe'], env_vars=['PATH'], verbose=False)] + File "C:\Users\QP19\AppData\Local\Continuum\Anaconda2\lib\site-packages\nltk\internals.py", line 602, in find_binary binary_names, url, verbose)) File "C:\Users\QP19\AppData\Local\Continuum\Anaconda2\lib\site-packages\nltk\internals.py", line 596, in find_binary_iter url, verbose): File "C:\Users\QP19\AppData\Local\Continuum\Anaconda2\lib\site-packages\nltk\internals.py", line 567, in find_file_iter raise LookupError('\n\n%s\n%s\n%s' % (div, msg, div)) LookupError: =========================================================================== NLTK was unable to find the gs file! Use software specific configuration paramaters or set the PATH environment variable. =========================================================================== </code></pre> 根据<a href="https://stackoverflow.com/a/37160385/3967806">to this post</a>，解决方案是安装Ghostscript，因为chunker正试图使用它来显示一个解析树，并在寻找3个二进制文件中的一个： <pre><code>file_names=['gs', 'gswin32c.exe', 'gswin64c.exe'] </code></pre> 使用。但即使我安装了ghostscript，我现在可以在windows搜索中找到二进制文件，但我仍然得到相同的错误。在 我需要修复或更新什么？在 <hr/> 其他路径信息： <pre><code>import os; print os.environ['PATH'] </code></pre> 退货： <pre><code>C:\Users\QP19\AppData\Local\Continuum\Anaconda2\Library\bin;C:\Users\QP19\AppData\Local\Continuum\Anaconda2\Library\bin;C:\Users\QP19\AppData\Local\Continuum\Anaconda2;C:\Users\QP19\AppData\Local\Continuum\Anaconda2\Scripts;C:\Users\QP19\AppData\Local\Continuum\Anaconda2\Library\bin;C:\Users\QP19\AppData\Local\Continuum\Anaconda2\Library\bin;C:\Program Files (x86)\Parallels\Parallels Tools\Applications;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Oracle\RPAS14.1\RpasServer\bin;C:\Oracle\RPAS14.1\RpasServer\applib;C:\Program Files (x86)\Java\jre7\bin;C:\Program Files (x86)\Java\jre7\bin\client;C:\Program Files (x86)\Java\jre7\lib;C:\Program Files (x86)\Java\jre7\jre\bin\client;C:\Users\QP19\AppData\Local\Continuum\Anaconda2;C:\Users\QP19\AppData\Local\Continuum\Anaconda2\Scripts;C:\Users\QP19\AppData\Local\Continuum\Anaconda2\Library\bin; </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

简而言之，： 请执行以下操作，而不是<code>>>> entities</code>： <pre><code>>>> print entities.__repr__() </code></pre> 或者： ^{pr2}$ <hr/> 长： 问题在于您试图打印<code>ne_chunk</code>的输出，这将触发ghostscript以获取带有NE标记的句子的字符串和绘图表示，该语句是一个<code>nltk.tree.Tree</code>对象。这将需要ghostscript，以便您可以使用小部件来可视化它。在 让我们一步一步来。在 首先，当您使用<code>ne_chunk</code>时，可以直接在顶层导入它： <pre><code>from nltk import ne_chunk </code></pre> 建议在导入时使用名称空间，即： <pre><code>from nltk import word_tokenize, pos_tag, ne_chunk </code></pre> 当您使用<code>ne_chunk</code>时，它来自<a href="https://github.com/nltk/nltk/blob/develop/nltk/chunk/__init__.py" rel="nofollow noreferrer">https://github.com/nltk/nltk/blob/develop/nltk/chunk/init.py</a> 目前还不清楚pickle加载的是什么类型的函数，但是经过一番检查，我们发现只有一个内置的NE chunker不是基于规则的，而且由于pickle二进制状态maxent的名称，我们可以假设它是一个统计chunker，因此它很可能来自于<code>NEChunkParser</code>对象：<a href="https://github.com/nltk/nltk/blob/develop/nltk/chunk/named_entity.py" rel="nofollow noreferrer">https://github.com/nltk/nltk/blob/develop/nltk/chunk/named_entity.py</a>。还有ACE数据API函数，比如pickle二进制文件的名称。在 现在，每当您可以使用<code>ne_chunk</code>函数时，它实际上是在调用 <code>NEChunkParser.parse()</code>返回<code>nltk.tree.Tree</code>对象的函数：<a href="https://github.com/nltk/nltk/blob/develop/nltk/chunk/named_entity.py#L118" rel="nofollow noreferrer">https://github.com/nltk/nltk/blob/develop/nltk/chunk/named_entity.py#L118</a> <pre><code>class NEChunkParser(ChunkParserI): """ Expected input: list of pos-tagged words """ def __init__(self, train): self._train(train) def parse(self, tokens): """ Each token should be a pos-tagged word """ tagged = self._tagger.tag(tokens) tree = self._tagged_to_parse(tagged) return tree def _train(self, corpus): # Convert to tagged sequence corpus = [self._parse_to_tagged(s) for s in corpus] self._tagger = NEChunkParserTagger(train=corpus) def _tagged_to_parse(self, tagged_tokens): """ Convert a list of tagged tokens to a chunk-parse tree. """ sent = Tree('S', []) for (tok,tag) in tagged_tokens: if tag == 'O': sent.append(tok) elif tag.startswith('B-'): sent.append(Tree(tag[2:], [tok])) elif tag.startswith('I-'): if (sent and isinstance(sent[-1], Tree) and sent[-1].label() == tag[2:]): sent[-1].append(tok) else: sent.append(Tree(tag[2:], [tok])) return sent </code></pre> 如果我们看一下<a href="https://github.com/nltk/nltk/blob/develop/nltk/tree.py" rel="nofollow noreferrer">^{<cd3>}</a>对象，当它试图调用<code>_repr_png_</code>函数时，会出现ghostscript问题：<a href="https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L702" rel="nofollow noreferrer">https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L702</a>： <pre><code>def _repr_png_(self): """ Draws and outputs in PNG for ipython. PNG is used instead of PDF, since it can be displayed in the qt console and has wider browser support. """ import os import base64 import subprocess import tempfile from nltk.draw.tree import tree_to_treesegment from nltk.draw.util import CanvasFrame from nltk.internals import find_binary _canvas_frame = CanvasFrame() widget = tree_to_treesegment(_canvas_frame.canvas(), self) _canvas_frame.add_widget(widget) x, y, w, h = widget.bbox() # print_to_file uses scrollregion to set the width and height of the pdf. _canvas_frame.canvas()['scrollregion'] = (0, 0, w, h) with tempfile.NamedTemporaryFile() as file: in_path = '{0:}.ps'.format(file.name) out_path = '{0:}.png'.format(file.name) _canvas_frame.print_to_file(in_path) _canvas_frame.destroy_widget(widget) subprocess.call([find_binary('gs', binary_names=['gswin32c.exe', 'gswin64c.exe'], env_vars=['PATH'], verbose=False)] + '-q -dEPSCrop -sDEVICE=png16m -r90 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -dSAFER -dBATCH -dNOPAUSE -sOutputFile={0:} {1:}' .format(out_path, in_path).split()) with open(out_path, 'rb') as sr: res = sr.read() os.remove(in_path) os.remove(out_path) return base64.b64encode(res).decode() </code></pre> 但是请注意，奇怪的是，当您在解释器中使用<code>>>> entities</code>时，python解释器会触发<code>_repr_png</code>，而不是{<cd13>}（请参见<a href="https://stackoverflow.com/questions/1984162/purpose-of-pythons-repr">Purpose of Python's __repr__</a>）。当试图打印出一个对象的表示时，本机CPython解释器不可能是如何工作的，所以我们看一下<code>Ipython.core.formatters</code>，我们看到它允许{<cd12>}在{a7}上被激发： <pre><code>class PNGFormatter(BaseFormatter): """A PNG formatter. To define the callables that compute the PNG representation of your objects, define a :meth:`_repr_png_` method or use the :meth:`for_type` or :meth:`for_type_by_name` methods to register functions that handle this. The return value of this formatter should be raw PNG data, *not* base64 encoded. """ format_type = Unicode('image/png') print_method = ObjectName('_repr_png_') _return_type = (bytes, unicode_type) </code></pre> 我们可以看到，当IPython初始化一个<code>DisplayFormatter</code>对象时，它试图激活所有格式化程序：<a href="https://github.com/ipython/ipython/blob/master/IPython/core/formatters.py#L66" rel="nofollow noreferrer">https://github.com/ipython/ipython/blob/master/IPython/core/formatters.py#L66</a> <pre><code>def _formatters_default(self): """Activate the default formatters.""" formatter_classes = [ PlainTextFormatter, HTMLFormatter, MarkdownFormatter, SVGFormatter, PNGFormatter, PDFFormatter, JPEGFormatter, LatexFormatter, JSONFormatter, JavascriptFormatter ] d = {} for cls in formatter_classes: f = cls(parent=self) d[f.format_type] = f return d </code></pre> 请注意，在<code>Ipython</code>之外，在本机CPython解释器中，它只调用<code>__repr__</code>，而不是{<cd12>}： <pre><code>>>> from nltk import ne_chunk >>> from nltk import word_tokenize, pos_tag, ne_chunk >>> Sentence = "Betty Botter bought some butter, but she said the butter is bitter, I f I put it in my batter, it will make my batter bitter." >>> sentence = "Betty Botter bought some butter, but she said the butter is bitter, I f I put it in my batter, it will make my batter bitter." >>> entities = ne_chunk(pos_tag(word_tokenize(sentence))) >>> entities Tree('S', [Tree('PERSON', [('Betty', 'NNP')]), Tree('PERSON', [('Botter', 'NNP')]), ('bought', 'VBD'), ('some', 'DT'), ('butter', 'NN'), (',', ','), ('but', 'CC'), ('she', 'PRP'), ('said', 'VBD'), ('the', 'DT'), ('butter', 'NN'), ('is', 'VBZ'), ('bitter', 'JJ'), (',', ','), ('I', 'PRP'), ('f', 'VBP'), ('I', 'PRP'), ('put', 'VBD'), ('it', 'PRP'), ('in', 'IN'), ('my', 'PRP$'), ('batter', 'NN'), (',', ','), ('it', 'PRP'), ('will', 'MD'), ('make', 'VB'), ('my', 'PRP$'), ('batter', 'NN'), ('bitter', 'NN'), ('.', '.')]) </code></pre> <hr/> 所以现在的解决方案是： 解决方案1： 当打印出<code>ne_chunk</code>的字符串输出时，可以使用 <pre><code>>>> print entities.__repr__() </code></pre> IPython应该只显式地调用<code>__repr__</code>，而不是那样<code>>>> entities</code>，而不是调用所有可能的格式化程序。在 解决方案2 如果您真的需要使用<code>_repr_png_</code>来可视化树对象，那么我们需要找出如何将ghostscript二进制文件添加到NLTK环境变量中。在 在您的例子中，似乎默认的<code>nltk.internals</code>无法找到二进制文件。更具体地说，我们指的是<a href="https://github.com/nltk/nltk/blob/develop/nltk/internals.py#L599" rel="nofollow noreferrer">https://github.com/nltk/nltk/blob/develop/nltk/internals.py#L599</a> 如果我们回到<a href="https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L726" rel="nofollow noreferrer">https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L726</a>，我们会看到，它试图寻找 <pre><code>env_vars=['PATH'] </code></pre> 当NLTK试图初始化它的环境变量时，它正在查看<code>os.environ</code>，请参见{a11} 注意，<code>find_binary</code>调用<code>find_binary_iter</code>，后者调用<code>find_binary_iter</code>，后者试图通过获取<code>os.environ</code>来寻找{<cd30>} 因此，如果我们在路径中添加： <pre><code>>>> import os >>> from nltk import word_tokenize, pos_tag, ne_chunk >>> path_to_gs = "C:\Program Files\gs\gs9.19\bin" >>> os.environ['PATH'] += os.pathsep + path_to_gs </code></pre> 现在这应该在Ipython中起作用： ^{pr2}$

在NLTK中找不到ghostscript？

1 个回答

相关Python问题