我有以下示例:我希望将每个序列列表转换为一个热编码器
例如,我有一个包含两个句子的列表。我首先把这些句子转换成序列表
然后对于每个序列列表,我根据每个单词将序列转换为一个hot
from nltk.tokenize import word_tokenize
from itertools import chain
from keras.preprocessing.sequence import pad_sequences
a = ['hi', 'oh thanks i m fine this is an evening in my timezone']
a_tokens = [word_tokenize(word) for word in a]
tokens_dict = {word:i for i, word in enumerate(set(chain.from_iterable(a_tokens)))}
tokens_sequence = [[tokens_dict[word_t] for word_t in word] for word in a_tokens]
电流输出:
[[4], [2, 5, 3, 1, 8, 7, 9, 0, 12, 10, 11, 6]]
预期产出:
[[[0,0,0,0,0,0,0,0,0,0,0,0,1]],
[[-12 0s but 1 for the repsective word-],
[-12 0s but 1 for the repsective word-],
[-12 0s but 1 for the repsective word-],
[-12 0s but 1 for the repsective word-],
[-12 0s but 1 for the repsective word-],
[-12 0s but 1 for the repsective word-],
[-12 0s but 1 for the repsective word-],
[-12 0s but 1 for the repsective word-],
[-12 0s but 1 for the repsective word-],
[-12 0s but 1 for the repsective word-],
[-12 0s but 1 for the repsective word-],
[-12 0s but 1 for the repsective word-]]]
您可以使用to_category of keras.utils.np_utils获得标签的一个_hot_向量,如下所示:
希望这有帮助
相关问题 更多 >
编程相关推荐