TextBlob和NLTK词性标注精度

2024-09-29 23:32:03 发布

您现在位置:Python中文网/ 问答频道 /正文

到目前为止,我有下面的代码

from textblob import TextBlob
class BrinBot:

    def __init__(self, message): #Accepts the message from the user as the argument
        parse(message)

class parse:
    def __init__(self, message):
        self.message = message
        blob = TextBlob(self.message)
        print(blob.tags)

BrinBot("Handsome Bob's dog is a beautiful Chihuahua")

这是输出:

[('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('Chihuahua', 'NNP')]

我的问题是,显然TextBlob认为“帅哥”是一个单数专有名词,这是不正确的,因为“帅哥”应该是一个形容词。有没有办法解决这个问题,我在NLTK上也尝试过,但得到了相同的结果


Tags: thefromselfmessageparseinitdefblob
1条回答
网友
1楼 · 发布于 2024-09-29 23:32:03

这种情况之所以发生,是因为帅哥的大写使其成为鲍勃名字的一部分。这不一定是一个不正确的分析,但如果你想强制形容词分析,你可以删除大写'帅'如下面的文本2和文本4

text = "Handsome Bob's dog is a beautiful chihuahua"

BrinBot(text)
[('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('Chihuahua', 'NNP')]

text2 = "handsome bob's dog is a beautiful chihuahua"

BrinBot(text2)
[('handsome', 'JJ'), ('bob', 'NN'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN')]

text3 = "That beautiful chihuahua is handsome Bob's dog"

BrinBot(text3)
[('That', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN'), ('is', 'VBZ'), ('handsome', 'JJ'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN')]

text4 = "That beautiful chihuahua is Handsome Bob's dog"

BrinBot(text4)
[('That', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN'), ('is', 'VBZ'), ('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN')]

相关问题 更多 >

    热门问题