如何重写发送终端的结果

2024-09-28 15:33:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用一个名为nlpnet的python库。 这个库是一个来自巴西葡萄牙语单词的标签,经过多次诱惑,最终达到了这样的效果: Output of tagged data in terminal

我们能从词尾的图像中感觉到什么,以及它用语法类的缩写对每个单词进行单独分类。该算法的挑战是搜索整个分析文档,并且只重写包含我选择的某些语法类的5个以上单词的句子。在

示例:分析一个包含多个句子的文本文档,并在另一个文件中重写只有5个以上动词或形容词的句子。在

使用的代码: 类来准备贴标器:

#!/usr/bin/python
# -*- coding: utf8 -*-
import nlpnet


def get_tags(content):
    #Labeling templates directory
    data_dir = 'pos-pt';
    #Definition of the directory and language to be used
    tagger = nlpnet.POSTagger(data_dir, language='pt');

    for i in range(content.__len__()):
        str = content[i];
        # Método para a etiquetação da sentença
        tagged_str = tagger.tag(str);
        print(tagged_str);

    return content;

文件类别:`

^{pr2}$

Tags: 文件ofinptdatadir语法content
2条回答

在命令行中运行程序时,写入$python python_filename.py > savingfilename.txt。这将把屏幕上打印的所有内容保存到文本文件中。在

如果您只想对文档中的句子进行POS标记,并将包含N个以上所选POS的句子转储到文件中,则不需要您发布的第二个脚本。在

这是一个极其简化的例子:

import os
import nlpnet

TAGGER = nlpnet.POSTagger('pos-pt', language='pt')


# You could have a function that tagged and verified if a
# sentence meets the criteria for storage.

def is_worth_saving(text, pos, pos_count):
    # tagged sentences are lists of tagged words, which in
    # nlpnet are (word, pos) tuples. Tagged texts may contain
    # several sentences.
    pos_words = [word for sentence in TAGGER.tag(text)
                 for word in sentence
                 if word[1] == pos]
    return len(pos_words) >= pos_count


# Then you'd just need to open your original file, read a sentence, tag
# it, decide if it's worth saving, and save it or not. Until you consume 
# the entire original file. Thus not loading the entire dataset in memory 
# and keeping a small memory footprint.

with open('opiniaoaborto.txt', encoding='utf8') as original_file:
    with open('oracaos_interessantes.txt', 'w') as output_file:
        for text in original_file:
            # For example, only save sentences with more than 5 verbs in it
            if is_worth_saving(text, 'V', 5):
                output_file.write(text + os.linesep)

回答你的跟进。你要检查一个句子是否包含5个单词,这些单词都用给定列表中的任何词性标记。我设想两种情况:

A)这5个词必须属于同一个词性。例如,含有5个动词(‘Comendo,dançando,procurando,olhando e falando’)或5个名词(‘O gato,O sapo,O cãO,O loro e O ratãO foram as compras'),而不是5个动词+名词(‘O gato esta querendo comer O ratãO’[2个名词,3个动词])。在

^{pr2}$

B)句子包含5个词性词组,由列表中任意一个词组的和组成。例如:“O gato esta querendo comer O ratãO”(2个名词+3个动词)

import os
import nlpnet

TAGGER = nlpnet.POSTagger('pos-pt', language='pt')

# Again, one of the arguments would have to take a list of valid POS
def is_worth_saving(text, pos_list, pos_count):
    pos_words = [word for sentence in TAGGER.tag(text)
                 for word in sentence
                 if word[1] in pos_list]
    return len(pos_words) >= pos_count

with open('opiniaoaborto.txt', encoding='utf8') as original_file:
    with open('oracaos_interessantes.txt', 'w') as output_file:
        for text in original_file:
            # For example, only save sentences whose sum of verbs and nouns count is 5
            if is_worth_saving(text, ['V', 'N'], 5):
                output_file.write(text + os.linesep)

相关问题 更多 >