擅长:python、mysql、java
<p>Keras<code>one_hot</code>方法要求第一个参数是整数类型(在您的例子中是单词索引)。所以在使用<code>one_hot</code>方法之前,首先需要将每个单词映射到唯一的整数。你知道吗</p>
<pre class="lang-py prettyprint-override"><code>docs = ['Well done!',
'Good work',
'Great effort',
'nice work',
'Excellent!',
'Weak',
'Poor effort!',
'not good',
'poor work',
'Could have done better.']
all_words = set()
for s in docs:
for word in s.split():
all_words.add(word)
all_words = list(all_words)
# define class labels
labels = np.array([1,1,1,1,1,0,0,0,0,0])
from keras import backend as K
# integer encode the documents
vocab_size = len(all_words)
encoded_docs = [[K.one_hot(all_words.index(word), vocab_size) for word in d.split()] for d in docs]
print(encoded_docs)
</code></pre>
<p>如果要将标点符号编码为单独的单词,那么可以使用<code>re</code>模块拆分单词。你知道吗</p>
<pre class="lang-py prettyprint-override"><code>import re
import string
encoded_docs = [[K.one_hot(all_words.index(word), vocab_size) for word in re.findall("[\w]+|["+string.punctuation+"]", d) for d in docs]
</code></pre>