在无扫描操作的梯度中断开输入

2024-10-03 17:18:20 发布

男 | 程序猿一只，喜欢编程写python代码。

我有很多不同大小的物品。对于每个组，一个（已知）项是“正确”项。有一个函数可以为每个项目分配一个分数。这将产生一个项目得分的平面向量，以及向量告诉索引每个组从哪里开始，有多大。我想对每组的分数做一个“softmax”运算，分配项目的概率，然后取正确答案的概率日志的总和。这是一个更简单的版本，我们只返回正确答案的分数，不带softmax和对数。在

import numpy                                                                                                                                                                                                                                                                          
import theano                                                                                                                                                                                                                                                                         
import theano.tensor as T                                                                                                                                                                                                                                                             
from theano.printing import Print                                                                                                                                                                                                                                                     

def scoreForCorrectAnswer(groupSize, offset, correctAnswer, preds):  
    # for each group, this will get called with the size of
    # the group, the offset of where the group begins in the 
    # predictions vector, and which item in that group is correct                                                                                                                                                                                                                                                                                                                                                                                                                                              
    relevantPredictions = preds[offset:offset+groupSize]                                                                                                                                                                                                                              
    ans = Print("CorrectAnswer")(correctAnswer)                                                                                                                                                                                                                                       
    return relevantPredictions[ans]       

groupSizes = T.ivector('groupSizes')                                                                                                                                                                                                                                                  
offsets = T.ivector('offsets')                                                                                                                                                                                                                                                        
x = T.fvector('x')                                                                                                                                                                                                                                                                    
W = T.vector('W')                                                                                                                                                                                                                                                                     
correctAnswers = T.ivector('correctAnswers')                                                                                                                                                                                                                                          

# for this simple example, we'll just score the items by
# element-wise product with a weight vector                                                                                                                                                                                                                                                                                  
predictions = x * W                                                                                                                                                                                                                                                                   

(values, updates) = theano.map(fn=scoreForCorrectAnswer,                                                                                                                                                                                                                                       
   sequences = [groupSizes, offsets, correctAnswers],                                                                                                                                                                                                                                
   non_sequences = [predictions] )                                                                                                                                                                                                                                                    

func = theano.function([groupSizes, offsets, correctAnswers,                                                                                                                                                                                                                          
        W, x], [values])                                                                                                                                                                                                                                                              

sampleInput = numpy.array([0.1,0.7,0.3,0.05,0.3,0.3,0.3], dtype='float32')                                                                                                                                                                                                            
sampleW = numpy.array([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], dtype='float32')                                                                                                                                                                                                           
sampleOffsets = numpy.array([0,4], dtype='int32')                                                                                                                                                                                                                                     
sampleGroupSizes = numpy.array([4,3], dtype='int32')                                                                                                                                                                                                                                  
sampleCorrectAnswers = numpy.array([1,2], dtype='int32')                                                                                                                                                                                                                              

data = func (sampleGroupSizes, sampleOffsets, sampleCorrectAnswers, sampleW, sampleInput)                                                                                                                                                                                             
print data                                                                                                                                                                                                                                                                            

#these all three raise the same exception (see below)                                                                                                                                                                                                                                             
gW1 = T.grad(cost=T.sum(values), wrt=W)                                                                                                                                                                                                                                               
gW2 = T.grad(cost=T.sum(values), wrt=W, disconnected_inputs='warn')                                                                                                                                                                                                                   
gW3 = T.grad(cost=T.sum(values), wrt=W, consider_constant=[groupSizes,offsets])

这正确地计算了输出，但是当我试图对参数W取梯度时，我得到（路径缩写）：

^{pr2}$

现在，groupSizes是常量，所以没有必要对它采取任何梯度。通常，您可以通过抑制DisconnectedInputError或告诉Theano在T.grad调用中将groupSizes视为常量来处理（请参阅示例脚本的最后几行）。但是似乎没有任何方法可以将这些东西传递给ScanOp的梯度计算中的内部T.grad调用。在

我错过什么了吗？这是一种让梯度计算通过ScanOp的方法吗？在

Tags： the 项目 import numpy group theano array offset

1条回答

网友

1楼 · 发布于 2024-10-03 17:18:20

到2013年2月中旬（0.6.0rc-2），这是一个Theano bug。在本文发布之日，它已经在github上的开发版本中得到了修复。在

在无扫描操作的梯度中断开输入

相关问题更多 >

编程相关推荐

热门问题

热门文章

在无扫描操作的梯度中断开输入

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >