<p>首先,看这个视频:<a href="https://www.youtube.com/watch?v=0Ef9GudbxXY" rel="noreferrer">https://www.youtube.com/watch?v=0Ef9GudbxXY</a></p>
<p><img src="https://i.stack.imgur.com/QqgkH.jpg" alt="enter image description here"/></p>
<p>现在我们来看看正确的答案:</p>
<pre><code>import re
import io
from nltk import pos_tag, word_tokenize, sent_tokenize, RegexpParser
xstring = u"An electronic library (also referred to as digital library or digital repository) is a focused collection of digital objects that can include text, visual material, audio material, video material, stored as electronic media formats (as opposed to print, micro form, or other media), along with means for organizing, storing, and retrieving the files and media contained in the library collection. Digital libraries can vary immensely in size and scope, and can be maintained by individuals, organizations, or affiliated with established physical library buildings or institutions, or with academic institutions.[1] The electronic content may be stored locally, or accessed remotely via computer networks. An electronic library is a type of information retrieval system."
chunkGram1 = r"""Chunk: {<JJ\w?>*<NN>}"""
chunkParser1 = RegexpParser(chunkGram1)
chunked = [chunkParser1.parse(pos_tag(word_tokenize(sent)))
for sent in sent_tokenize(xstring)]
with io.open('outfile', 'w', encoding='utf8') as fout:
for chunk in chunked:
fout.write(str(chunk)+'\n\n')
</code></pre>
<p>[出来]:</p>
^{pr2}$
<p>如果你必须坚持使用python2.7:</p>
<pre><code>with io.open('outfile', 'w', encoding='utf8') as fout:
for chunk in chunked:
fout.write(unicode(chunk)+'\n\n')
</code></pre>
<p>[出来]:</p>
<pre><code>alvas@ubi:~$ python test2.py
alvas@ubi:~$ head outfile
(S
An/DT
(Chunk electronic/JJ library/NN)
(/:
also/RB
referred/VBD
to/TO
as/IN
(Chunk digital/JJ library/NN)
or/CC
alvas@ubi:~$ python3 test2.py
Traceback (most recent call last):
File "test2.py", line 18, in <module>
fout.write(unicode(chunk)+'\n\n')
NameError: name 'unicode' is not defined
</code></pre>
<p>如果您必须坚持使用py2.7,强烈建议您:</p>
<pre><code>from six import text_type
with io.open('outfile', 'w', encoding='utf8') as fout:
for chunk in chunked:
fout.write(text_type(chunk)+'\n\n')
</code></pre>
<p>[出来]:</p>
<pre><code>alvas@ubi:~$ python test2.py
alvas@ubi:~$ head outfile
(S
An/DT
(Chunk electronic/JJ library/NN)
(/:
also/RB
referred/VBD
to/TO
as/IN
(Chunk digital/JJ library/NN)
or/CC
alvas@ubi:~$ python3 test2.py
alvas@ubi:~$ head outfile
(S
An/DT
(Chunk electronic/JJ library/NN)
(/:
also/RB
referred/VBD
to/TO
as/IN
(Chunk digital/JJ library/NN)
or/CC
</code></pre>