如何从nltk pos_标记获取标记集?

2024-10-01 09:35:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从nltk pos_标记中获取完整标记,但我找不到使用nltk的简单方法。例如,使用tagsets='universal'

from nltk.tokenize import word_tokenize

def nltk_pos(text):
    token = word_tokenize(text)
    return (nltk.pos_tag(token)[0])[1]

nltk_pos('home')
output: 'NN'
expected output: 'NOUN'

Tags: 方法textfrom标记posimporttokenoutput
1条回答
网友
1楼 · 发布于 2024-10-01 09:35:10

我在为我写的一篇论文做NLP分析时也遇到了同样的问题。我必须使用这样的映射函数:

import nltk
from nltk.tokenize import word_tokenize

def get_full_tag_pos(pos_tag):
    tag_dict = {"J": "ADJ",
                "N": "NOUN",
                "V": "VERB",
                "R": "ADV"}
    # assuming pos_tag comes in as capital letters i.e. 'JJR' or 'NN'
    return tag_dict.get(pos_tag[0], 'NOUN')

# example
words = word_tokenize(text)
words_pos = nltk.pos_tag(words)
full_tag_words_pos = [word_pos[0] + "/" + get_full_tag_pos(word_pos[1]) for word_pos in words_pos]

相关问题 更多 >