Thompson采样：在Python中为人工智能添加正向奖励和负向奖励问题的回答

Thompson采样：在Python中为人工智能添加正向奖励和负向奖励

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

<p>在AI速成课程的第5章中，作者写道</p> <pre><code> nSelected = nPosReward + nNegReward for i in range(d): print('Machine number ' + str(i + 1) + ' was selected ' + str(nSelected[i]) + ' times') print('Conclusion: Best machine is machine number ' + str(np.argmax(nSelected) + 1)) </code></pre> <p>为什么负面奖励的数量与正面奖励的数量相加？要找到最好的机器，难道我们不应该只关注回报率最高的机器吗？我不明白为什么我们要把消极的奖励加上积极的奖励。我还了解到，这是一个模拟，您随机分配成功率，并预先分配成功率。然而在现实生活中，你如何提前知道每台老虎机的成功率？您如何知道哪些机器应该被分配“1”？非常感谢你！以下是完整的代码：</p> <pre><code># Importing the libraries import numpy as np # Setting conversion rates and the number of samples conversionRates = [0.15, 0.04, 0.13, 0.11, 0.05] N = 10000 d = len(conversionRates) # Creating the dataset X = np.zeros((N, d)) for i in range(N): for j in range(d): if np.random.rand() < conversionRates[j]: X[i][j] = 1 # Making arrays to count our losses and wins nPosReward = np.zeros(d) nNegReward = np.zeros(d) # Taking our best slot machine through beta distribution and updating its losses and wins for i in range(N): selected = 0 maxRandom = 0 for j in range(d): randomBeta = np.random.beta(nPosReward[j] + 1, nNegReward[j] + 1) if randomBeta > maxRandom: maxRandom = randomBeta selected = j if X[i][selected] == 1: nPosReward[selected] += 1 else: nNegReward[selected] += 1 # Showing which slot machine is considered the best nSelected = nPosReward + nNegReward for i in range(d): print('Machine number ' + str(i + 1) + ' was selected ' + str(nSelected[i]) + ' times') print('Conclusion: Best machine is machine number ' + str(np.argmax(nSelected) + 1)) </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

Thompson采样：在Python中为人工智能添加正向奖励和负向奖励

1 个回答

相关Python问题