尝试切片数组会导致“数组的索引太多”。我可以填充阵列来修复这个问题吗？

""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" Pretreat Data Section """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" # integer encode sequences of words # create the tokenizer t = Tokenizer() # fit the tokenizer on the headlines t.fit_on_texts(headlines) sequences = t.texts_to_sequences(headlines) # vocabulary size vocab_size = len(t.word_index) + 1 #separate into input and output sequences = np.array(sequences) X, y = sequences[:,:-1], sequences[:,-1] # fix this --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-87-eb7aab0c3a22> in <module> 18 #separate into input and output 19 sequences = np.array(sequences) ---> 20 X, y = sequences[:,:-1], sequences[:,-1] # fix this 21 y = to_categorical(y, num_classes=vocab_size) 22 seq_length = X.shape[1] IndexError: too many indices for array

1条回答

网友

1楼 · 发布于 2024-09-26 17:46:34

问题是本教程在一个页面上有几个部分，每个部分都有自己的"Complete Example"

首先"Complete Example"从republic_clean.txt读取文本，清除它并将其保存在republic_sequences.txt中—它创建具有相同字数的序列。你知道吗

第二个"Complete Example"从republic_sequences.txt读取文本并将其与

sequences = np.array(sequences)
X, y = sequences[:,:-1], sequences[:,-1]

因为第一部分创建了具有相同字数的序列，所以这段代码可以正常工作。你知道吗

你好像跳过了第一部分。你必须回到第一部分学习如何清除文本，以及如何创建正确的文件，你可以在第二部分使用。你知道吗

编辑：如果不能用相同的字数生成序列，则可以在较短的序列中添加空格。代码可以工作，但我不知道它是否能创建更好的模型。你知道吗

sequences = [['a'], ['b','c'], ['d','e','f']]

max_len = max(map(len, sequences))

sequences = [x + [""]*(max_len-len(x)) for x in sequences]

print(sequences)

结果

[['a', '', ''], ['b', 'c', ''], ['d', 'e', 'f']]

相关问题更多 >

编程相关推荐

热门问题

热门文章