我已经在python中实现了一个聊天机器人,它使用一个名为“intents”的数据集进行训练,该数据集是一个json文件,格式如下:
{"intents": [
{"tag": "greeting",
"patterns": ["Hi there", "How are you", "Is anyone there?","Hey","Hola", "Hello", "Good day"],
"responses": ["Hello, thanks for asking", "Good to see you again", "Hi there, how can I help?"],
},
{"tag": "goodbye",
"patterns": ["Bye", "See you later", "Goodbye", "Nice chatting to you, bye", "Till next time"],
"responses": ["See you!", "Have a nice day", "Bye! Come back again soon."],
},
{"tag": "thanks",
"patterns": ["Thanks", "Thank you", "That's helpful", "Awesome, thanks", "Thanks for helping me"],
"responses": ["Happy to help!", "Any time!", "My pleasure"],
},
{"tag": "noanswer",
"patterns": [],
"responses": ["Sorry, can't understand you", "Please give me more info", "Not sure I understand"],
.
.
.
其中,标签是用户问题的类别(模式),以及相关的可能响应。 在训练阶段之前,对数据集进行转换,使用标记化提取模式的每个单词,然后应用柠檬化。因此,trainig集合由带有相关标签(标签)的模式组成,其中模式表示为单词包,标签用一个热编码进行编码。 然后,模型定义如下:
model = Sequential()
model.add(Dense(128, input_shape=(x_train.shape[1],), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(classes), activation="softmax"))
# set the optimizer
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
# compile the model
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
它接受了500个时代的培训,批量大小为16
分类效果很好,该模型能够在给定正确“标签”的情况下正确地分类看不见的问题。如果预测概率高于0.75,则模型返回正确的标记,否则应返回标记“noanswer”
问题是,当我向聊天机器人提出一个故意错误的问题时,我会编写一个随机字符串,如“fejfeajlflnk”或类似的测试,在这种情况下,返回的标记是“noanswer”(低预测概率,低于0.75)。分类总是以高概率预测与标记“问候语”关联的类(从0.8到0.99),我无法理解这个事实。有人能帮我理解分类器为什么会这样做吗
如果这个问题还没有解决
请查找字段错误_阈值并更改为0.01
相关问题 更多 >
编程相关推荐