将序列列表转换为一个热

2024-09-30 20:38:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下示例:我希望将每个序列列表转换为一个热编码器

例如,我有一个包含两个句子的列表。我首先把这些句子转换成序列表

然后对于每个序列列表,我根据每个单词将序列转换为一个hot

from nltk.tokenize import word_tokenize
from itertools import chain
from keras.preprocessing.sequence import pad_sequences

a = ['hi', 'oh thanks i m fine this is an evening in my timezone']
a_tokens = [word_tokenize(word) for word in a]
tokens_dict = {word:i for i, word in enumerate(set(chain.from_iterable(a_tokens)))}
tokens_sequence = [[tokens_dict[word_t] for word_t in word] for word in a_tokens]

电流输出:

[[4], [2, 5, 3, 1, 8, 7, 9, 0, 12, 10, 11, 6]]

预期产出:

[[[0,0,0,0,0,0,0,0,0,0,0,0,1]],
 [[-12 0s but 1 for the repsective word-],
  [-12 0s but 1 for the repsective word-],
  [-12 0s but 1 for the repsective word-],
  [-12 0s but 1 for the repsective word-],
  [-12 0s but 1 for the repsective word-],
  [-12 0s but 1 for the repsective word-],
  [-12 0s but 1 for the repsective word-],
  [-12 0s but 1 for the repsective word-],
  [-12 0s but 1 for the repsective word-],
  [-12 0s but 1 for the repsective word-],
  [-12 0s but 1 for the repsective word-],
  [-12 0s but 1 for the repsective word-]]]

Tags: theinfromimportchain列表for序列
1条回答
网友
1楼 · 发布于 2024-09-30 20:38:38

您可以使用to_category of keras.utils.np_utils获得标签的一个_hot_向量,如下所示:

from nltk.tokenize import word_tokenize
from itertools import chain
from keras.preprocessing.sequence import pad_sequences
import numpy as np
from keras.utils.np_utils import to_categorical

a = ['hi', 'oh thanks i m fine this is an evening in my timezone']
a_tokens = [word_tokenize(word) for word in a]
tokens_dict = {word:i for i, word in enumerate(set(chain.from_iterable(a_tokens)))}
tokens_sequence = [[tokens_dict[word_t] for word_t in word] for word in a_tokens]

labels=np.array(tokens_sequence)

max_label=max([max(l) for l in labels]) # get maximum value in labels= the label of the word with highest label; here is 12

one_hot_labels=[]
for label in labels: 
    label.append(max_label) # add the label of the word with highest label
    one_hot=to_categorical(label,dtype=np.int32)   #get one-hot-labels  
    one_hot_labels.append( one_hot[:-1]) # remove one-hot of the word with highest label and add reaming into the list 
    
one_hot_labels=np.array(one_hot_labels)
print(one_hot_labels)

希望这有帮助

相关问题 更多 >