用于聊天机器人应答的Keras多分类器

2024-09-29 21:32:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经在python中实现了一个聊天机器人,它使用一个名为“intents”的数据集进行训练,该数据集是一个json文件,格式如下:

{"intents": [
    {"tag": "greeting",
     "patterns": ["Hi there", "How are you", "Is anyone there?","Hey","Hola", "Hello", "Good day"],
     "responses": ["Hello, thanks for asking", "Good to see you again", "Hi there, how can I help?"],

    },
    {"tag": "goodbye",
     "patterns": ["Bye", "See you later", "Goodbye", "Nice chatting to you, bye", "Till next time"],
     "responses": ["See you!", "Have a nice day", "Bye! Come back again soon."],
     
    },
    {"tag": "thanks",
     "patterns": ["Thanks", "Thank you", "That's helpful", "Awesome, thanks", "Thanks for helping me"],
     "responses": ["Happy to help!", "Any time!", "My pleasure"],
     
    },
    {"tag": "noanswer",
     "patterns": [],
     "responses": ["Sorry, can't understand you", "Please give me more info", "Not sure I understand"],
     .
     .
     .

其中,标签是用户问题的类别(模式),以及相关的可能响应。 在训练阶段之前,对数据集进行转换,使用标记化提取模式的每个单词,然后应用柠檬化。因此,trainig集合由带有相关标签(标签)的模式组成,其中模式表示为单词包,标签用一个热编码进行编码。 然后,模型定义如下:

model = Sequential()
model.add(Dense(128, input_shape=(x_train.shape[1],), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(classes), activation="softmax"))
# set the optimizer
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
# compile the model
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

它接受了500个时代的培训,批量大小为16

分类效果很好,该模型能够在给定正确“标签”的情况下正确地分类看不见的问题。如果预测概率高于0.75,则模型返回正确的标记,否则应返回标记“noanswer

问题是,当我向聊天机器人提出一个故意错误的问题时,我会编写一个随机字符串,如“fejfeajlflnk”或类似的测试,在这种情况下,返回的标记是“noanswer”(低预测概率,低于0.75)。分类总是以高概率预测与标记“问候语”关联的类(从0.8到0.99),我无法理解这个事实。有人能帮我理解分类器为什么会这样做吗


Tags: to数据标记模型youaddmodeltag

热门问题