Python中的脉动阵列仿真

from threading import Thread from collections import deque vals_deque = deque(maxlen=9*2)#will hold the interconnections between nodes of the systolicarray dump=deque(maxlen=9) # will be the output of the SystolicArray prev_size = 0 def setupSystolicArray(): global SystolicArray SystolicArray = [NodeSystolic(i,j) for i in range(3), for i in range(3)] def spreadInputs(a,b): #needs some way to initially propagate the elements of a and b through the top and leftmost parts of the systolic array new = map(lambda x: x.start() , SystolicArray) #start all the nodes of the systolic array, they are waiting for an input #even if i found a way to put these inputs into the array at the start, I am not sure how to coordinate future inputs into the array in the cascading fashion described in the slides while(len(dump) != 9): if(len(vals_deque) != prev_size): vals = vals_deque[-1] row = vals['t'][0] col = vals['l'][0] a= vals['t'][1] b = vals['l'][1] # these if elif statements track whether the outputs are at the edge of the systolic array and can be removed if(row >= 3): dump.append(a) elif(col >= 3): dump.append(b) else: #something is wrong with the logic here SystolicArray[t][l-1].update(a,b) SystolicArray[t-1][l].update(a,b) class NodeSystolic: def __init__(self,row, col): self.row = row self.col = col self.currval = 0 self.up = False self.ain = 0#coming in from the top self.bin = 0#coming in from the side def start(self): Thread(target=self.continuous, args = ()).start() def continuous(self): while True: if(self.up = True): self.currval = self.ain*self.bin self.up = False self.passon(self.ain, self.bin) else: pass def update(self, left, top): self.up = True self.ain = top self.bin = left def passon(self, t,l): #this will passon the inputs this node has received onto the next nodes vals_deque.append([{'t': [self.row+ 1, self.ain], 'l': [self.col + 1, self.bin]}]) def returnValue(self): return self.currval def main(): a = np.array([ [1,2,3], [4,5,6], [7,8,9], ]) b = np.array([ [1,2,3], [4,5,6], [7,8,9] ]) setupSystolicArray() spreadInputs(a,b)

1条回答

网友

1楼 · 发布于 2024-10-01 17:26:34

考虑在Python中模拟systolic数组是很有趣的，但是我认为按照上面所画的路线来做这件事有一些重大的困难。在

最重要的是，存在由Global Interpreter Lock引起的Python对真正并行性的有限范围的问题。这意味着对于计算受限的任务，您不会获得任何显著的并行性，而且它的线程可能最适合处理I/O受限的任务，例如web请求或文件系统访问。Python最接近的方法可能是通过multiprocessing模块来实现的，但这需要为每个节点单独处理。在

第二，即使您要在systolic数组中实现数值运算的并行性，也需要一些locking机制来允许不同的线程在尝试同时读写数据时交换数据（或消息），而不会损坏彼此的内存。在

关于示例中的数据结构，我认为最好让systolic数组中的每个节点都引用其上游节点，而不是知道它位于NxM网格中的特定位置。我不认为脉动数组需要一个矩形网格，任何一个有向无环图（DAG）仍然有可能实现高效的分布式计算。在

总体上，我希望在Python中进行这种仿真的计算开销相对于使用诸如斯卡拉或C++之类的低级语言可以实现的巨大。即使这样，除非systolic数组中的每个节点都在进行大量的计算（即，远远超过几个乘法加法），否则节点之间交换消息的开销将是巨大的。因此，我认为您的模拟主要是为了了解数据流和阵列的高级行为，而不是为了接近定制的DSP（数字信号处理）硬件所能提供的功能。如果是这样的话，那么我很想不使用线程，而是使用一个集中的消息队列，所有节点都向该队列提交由全局消息分发机制传递的消息。在

相关问题更多 >

编程相关推荐

热门问题

热门文章