我用C++实现了一个C++网络,我尝试用GPU和Python一起训练它。我面临的问题是,输入非常大(而且稀疏),大约有50000个输入神经元,其中通常只有30个被激活
我的模型如下所示:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 24576) 0
__________________________________________________________________________________________________
input_2 (InputLayer) (None, 24576) 0
__________________________________________________________________________________________________
dense_1 (Dense) (None, 256) 6291712 input_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 256) 6291712 input_2[0][0]
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 256) 0 dense_1[0][0]
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, 256) 0 dense_2[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 512) 0 leaky_re_lu_1[0][0]
leaky_re_lu_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 32) 16416 concatenate_1[0][0]
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU) (None, 32) 0 dense_3[0][0]
__________________________________________________________________________________________________
dense_4 (Dense) (None, 32) 1056 leaky_re_lu_3[0][0]
__________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU) (None, 32) 0 dense_4[0][0]
__________________________________________________________________________________________________
dense_5 (Dense) (None, 1) 33 leaky_re_lu_4[0][0]
==================================================================================================
Total params: 12,600,929
Trainable params: 12,600,929
Non-trainable params: 0
我还获得了大约3亿个输入/输出值,我正试图将其输入到我的网络中。 不用说,这些数据太多了,无法一次全部安装到我的GPU上
为了提高速度,我生成了稀疏矩阵,每个矩阵表示大约100000个输入,并将它们保存在内存中(大约50Gb)。我可以很容易地加载它们,而不会像这样损失很多速度:
# loads both the inputs and the output for the given chunk (100000 inputs/outputs) from the memory
trainX1,trainX2,trainY = readNumpyChunkAndCreateInput(chunk)
我用它来训练我的人际网络,如下所示:
for chunk in chunks:
trainX1,trainX2,trainY = readNumpyChunkAndCreateInput(chunk)
_res = model.fit([trainX1,trainX2], trainY, epochs=1,steps_per_epoch=1,verbose=0)
loss = list(_res.history.values())[0]
totalLoss += loss[0]
显然,这无论如何都不是最优的。我知道Keras/TensorFlow中有一个叫做data generators
的东西,但遗憾的是,我不知道如何在我的具体案例中使用它们,因为所有教程都处理密集输入。
如果有人能帮助我,我很高兴
您好, 芬兰人
加载数据的方式:
filePath = os.path.abspath(os.path.dirname(sys.argv[0]))
path = filePath + "\\data\\" + name + "\\"
indices1 = np.load(path + 'indices1.npy')
indices2 = np.load(path + 'indices2.npy')
outputs = np.load(path + 'outputs.npy')
meta = open(path + 'meta.txt', "r")
metaInf = meta.readlines()[0].split(" ")
meta.close()
entry1Count = int(metaInf[0])
entry2Count = int(metaInf[1])
lineCount = int(metaInf[2])
values1 = tf.ones(entry1Count)
values2 = tf.ones(entry2Count)
shape = (lineCount, 6 * 64 * 64)
trainX1 = tf.SparseTensor(
indices=indices1,
values=values1,
dense_shape=shape
)
trainX2 = tf.SparseTensor(
indices=indices2,
values=values2,
dense_shape=shape
)
return trainX1, trainX2, outputs
我已经编写了一个小的生成器函数,您可以根据您的用例进行调整
在tf.data.Dataset中使用生成器的代码:
预回迁允许提前存储下一批,以消除任何延迟。 您可以使用此数据集传递给fit命令,也可以像这样使用自定义训练循环
相关问题 更多 >
编程相关推荐