<blockquote>
<p><strong>Question</strong>: ... dump to a file the sentences that contain more than N occurrences of a particular POS</p>
</blockquote>
<hr/>
<blockquote>
<p><strong>Note</strong>: Assuming <code>'document.txt'</code> contains <strong>one</strong> Sentence per Line! </p>
</blockquote>
<pre><code>def is_worth_saving(tags, pos, pos_count):
"""
:param tags: nlpnet tags from ONE Sentence
:param pos: The POS to filter
:param pos_count: Number of 'param pos'
:return:
True if 'tags' contain more than 'pos_count' occurrences of 'pos'
False otherwise
"""
pos_found = 0
# Iterate tags
for word, _pos in tags:
if _pos == pos:
pos_found += 1
return pos_found >= pos_count
if __name__ == '__main__':
with open('document.txt') as in_fh, open('document_test.txt', 'w') as out_fh:
for sentence in in_fh:
print('Sentence:{}'.format(sentence[:-1]))
tags = TAGGER.tag(sentence)
# As your Example Sentence has only **2** Verbs,
# pass 'pos_count=2'
if is_worth_saving(tags[0], 'V', 2):
out_fh.write(sentence)
print (tags[0])
</code></pre>
<blockquote>
<p><strong>Output</strong>: </p>
<pre><code>Sentence:O gato esta querendo comer o ratão
[(u'O', u'ART'), (u'gato', u'N'), (u'esta', u'PROADJ'), (u'querendo', u'V'), (u'comer', u'V'), (u'o', u'ART'), (u'rat', u'N')]
</code></pre>
</blockquote>
<p><strong><em>用Python测试:3.4.2和2.7.9</em></strong></p>