使用两个表作为多输出回归的输入,并使用缺失矩阵乘积keras

2024-10-01 09:38:55 发布

您现在位置:Python中文网/ 问答频道 /正文

在下面的keras MWE中,我试图训练一个多输出回归模型,其中1000个样本具有20个特征(X)作为输入,并产生大小为50(Y)的输出。然而,我错过了一个步骤,我没有把我的头绕过去,我错过了正确描述这个词。不管怎样,让我试试,请原谅我弄得一团糟:

这里,50个输出中的每一个都有一组10个“特征过滤器”,它们在这里与20个特征交互(例如通过点积)以产生数字输出。我错过了一个层,该层将训练一个大小为(20,10)的唯一权重矩阵,其总和(或平均值)随后生成数字输出Y。其思想是,输出以这些特征过滤器指定的方式对特征作出反应,并且这些交互在输出之间是一致的(例如,一个特征过滤器中的高值可能导致对一个特征的反应更高,而对另一个特征的反应更低,并且这些正/负关系不是特定于输出的,而是通过大小为10x20的公共权重矩阵为整个数据集确定的)

特定于输出的“特征过滤器”的侧面输入矩阵(10,50)如何进入网络?我下面的尝试包括:(1)侧面矩阵(即3D输出)的每个样本的tensordot积,(2)随后展平到1D以与一个小的致密层交互。然后致密层是(3)平铺/重复,使其保持较小,并学习适用于所有输出的权重。然后,通过平均,对平铺密集输出进行(4)降维,以适应(n,50)的输出格式

这种方法的问题是稠密层是完全连通的,这时只需要一个局部连通的权重矩阵(10*20)这是平铺50次。也就是说,功能和功能过滤器之间的每个交互都有1个权重/偏差,适用于每个输出。每个交互都有一个权重,我们可以可视化哪些交互是匹配输出的关键(如果完全连接,这是不可能的)

我想我需要用一些局部连接的、卷积的、可分离的或某种我不太理解的层来代替稠密层。有什么想法吗

import numpy as np
import tensorflow as tf
from tensorflow import keras

## create dummy input/output matrices
XData = np.ones((1000, 20)) ## 1000 samples, 20 features
YData = np.ones((1000, 50)) ## 1000 samples, 50 outputs
filterData = np.ones((10, 50)) ## 10 feature filters, 50 outputs
filterData = tf.cast(filterData, tf.float32) ## needed for tf.math.reduce_mean() below

## input of size (n, 20)
input = keras.Input(XData.shape[1])
## dot product with filterData, out size = (n, 20, 10, 50)
tdot = keras.layers.Lambda(lambda x: tf.tensordot(x, filterData, axes=0))(input)
# flatten for dense layers, out size = (n, 10000)
tflat = keras.layers.Flatten()(tdot)
## learning dense layer, out size = (n, 20*10),
tdense = keras.layers.Dense(XData.shape[1] * filterData.shape[0], activation="linear")(tflat)
## tiling layer that repeats the dense layer for every output
ttile = keras.layers.Lambda(lambda x: keras.backend.repeat(x, filterData.shape[1]))(tdense)
## reduce dimensions through averaging to fit YData, out size = (n, 50)
tmean = keras.layers.Lambda(lambda x: tf.math.reduce_mean(x, axis=(2)))(ttile)
## make the model
model = keras.Model(input, tmean)

model.compile(
    optimizer='adam',
    loss='mse'
)

history = model.fit(
    x = XData,
    y = YData,
    epochs = 3,
    validation_split = 0.3,
    verbose = 2,
    batch_size=10
  )

编辑

下面的代码实现了单一连接,即每个功能/功能过滤器交互一个重量(在整个输出中共享),这是密集层不允许的。它包含20*10=200个单个单元密集层的集合,这些单元密集层随后被拼接,然后再拼接50次。但是学习非常差,可能会在时间分布层内设置拼接集合,正如@SoheilStar所建议的那样。但是循环阻止我在@SoheilStar提供的顺序API代码中使用它。有什么帮助吗

## create dummy input/output matrices
XData = np.ones((1000, 20)) ## 1000 samples, 20 features
YData = np.zeros((1000, 50)) ## 1000 samples, 50 outputs
filterData = np.ones((10, 50)) ## 10 feature filters, 50 outputs
filterData = tf.cast(filterData, tf.float32) ## needed for tf.math.reduce_mean() below

## input of size (n, 20)
input = keras.Input(XData.shape[1])

## dot product with filterData, out size = (n, 20, 10, 50)
tdot = keras.layers.Lambda(lambda x: tf.tensordot(x, filterData, axes=0))(input)

# flatten for dense layers, out size = (n, 10000)
tflat = keras.layers.Flatten()(tdot)

## singular connection layer, i.e. a concatenated collection of single unit dense layer, out size = (n, 200)
dense_list = [None] * (filterData.shape[0] * XData.shape[1])
for i in range(filterData.shape[0] * XData.shape[1]):
    dense_list[i] = keras.layers.Dense(1, activation="linear")(tflat[:,i:(i+1)])
tdense = keras.layers.Concatenate()(dense_list)

## tiling layer that repeats the dense layer for every output
ttile = keras.layers.Lambda(lambda x: keras.backend.repeat(x, filterData.shape[1]))(tdense)

## reduce dimensions through averaging to fit YData, out size = (n, 50)
tmean = keras.layers.Lambda(lambda x: tf.math.reduce_mean(x, axis=(2)))(ttile)

## make the model
model = keras.Model(input, tmean)

编辑2

为了解决前一个在时间分布层中使用for循环的问题,我定义了一个自定义函数以提供给时间分布层:

## define a custom layer to be used in a time distributed layer with the sequential api
class customLayer(tf.keras.layers.Layer):
    def __init__(self):
        super().__init__()
        self.input_dim = filterData.shape[0] * XData.shape[1]
        self.dense_list = [None] * (self.input_dim)
        for i in range(self.input_dim):
            self.dense_list[i] = keras.layers.Dense(1, activation="linear")
        self.concat = keras.layers.Concatenate()
        self.flat = keras.layers.Flatten()

    def call(self, inputs):
        flat_input = self.flat(inputs)
        list = [None] * (self.input_dim)
        for i in range(self.input_dim):
            list[i] = self.dense_list[i](flat_input[:, i:(i+1)])
        return self.concat(list)

    def compute_output_shape(self, input_dim):
        return (None, self.input_dim)

## transpose and time distribute along the first dimension (now the output size)
tdot_ = tf.transpose(tdot, [0, 3, 1, 2])

## call the customLayer inside a time distributed layer
tcustom = tf.keras.layers.TimeDistributed(customLayer())(tdot_)

从技术上讲,这项工作做得很好,但学习能力很差。下面@SoheilStar的提议在改变了最后一行之后有效,因此我们改为:

## This layer would try to train its parameters according to each parameter
tdense_ = tf.keras.layers.TimeDistributed(tf.keras.layers.Flatten())(tdot_)
tdense_ = [tf.keras.layers.TimeDistributed(keras.layers.Dense(1, activation="linear"))(tdense_[:, :, i][..., None]) for i in range(XData.shape[1] * filterData.shape[0])]
tdense_ = tf.keras.layers.Concatenate()(tdense_)

虽然再次学习是贫穷的,但它可能是预期与我的真实数据和少量的权重存在


Tags: selflayerforinputsizelayerstf特征
1条回答
网友
1楼 · 发布于 2024-10-01 09:38:55

已更新

我不确定我是否正确地解决了这个问题。如果您希望在50个输出上同时训练密集层,那么您可以像这样使用TimeDistributed层:

## create dummy input/output matrices
XData = tf.ones((1000, 20)) ## 1000 samples, 20 features
YData = tf.ones((1000, 50)) ## 1000 samples, 50 outputs
filterData = tf.ones((10, 50))
TrData = tf.ones((10, 50), dtype=tf.float32) ## 10 feature filters, 50 outputs

## input of size (n, 20)
input = keras.Input(XData.shape[1])
## dot product with filterData, out size = (n, 20, 10, 50)
tdot = keras.layers.Lambda(lambda x: tf.tensordot(x, filterData, axes=0))(input)
## My modification
## change the order of dimensions in order to use the TimeDistributed layer
tdot_ = tf.transpose(tdot, [0, 3, 1, 2])
## This layer would try to train its parameters according to each output
tdense_ = tf.keras.layers.TimeDistributed(tf.keras.models.Sequential([tf.keras.layers.Flatten(),
                                                                      tf.keras.layers.Dense(XData.shape[1] * TrData.shape[0]),
                                                                      tf.keras.layers.Dense(1)]))(tdot_)
## I used the Flatten to squeeze the output and the final shape would be (Batch, 50)
final_output = tf.keras.layers.Flatten()(tdense_)

但如果不是,那么为什么不在tdense之后再加一个50大小的致密层呢?像这样:

tdense = keras.layers.Dense(XData.shape[1] * TrData.shape[0], activation="linear")(tflat)
final_output = keras.layers.Dense(50, activation="linear")(tdense)

更新

为了解决您提到的关于For循环的问题,我做了一些修改:

import numpy as np
import tensorflow as tf
from tensorflow import keras

## create dummy input/output matrices
XData = tf.ones((1000, 20)) ## 1000 samples, 20 features
YData = tf.ones((1000, 50)) ## 1000 samples, 50 outputs
filterData = tf.ones((10, 50))
TrData = tf.ones((10, 50), dtype=tf.float32) ## 10 feature filters, 50 outputs

## input of size (n, 20)
input = keras.Input(XData.shape[1])
## dot product with filterData, out size = (n, 20, 10, 50)
tdot = keras.layers.Lambda(lambda x: tf.tensordot(x, filterData, axes=0))(input)
## My modification
## change the order of dimension in order to use TimeDistributed
tdot_ = tf.transpose(tdot, [0, 3, 1, 2])
## This layer would try to train its parameters according to each parameter
tdense_ = tf.keras.layers.TimeDistributed(tf.keras.layers.Flatten())(tdot_)
tdense_ = [tf.keras.layers.TimeDistributed(keras.layers.Dense(1, activation="linear"))(tdense_[:, :, i][..., None]) for i in range(XData.shape[1] * filterData.shape[0])]
tdense_ = tf.keras.layers.TimeDistributed(tf.keras.layers.Concatenate())(tdense_)

## reduce dimensions through averaging to fit YData, out size = (n, 50)
tmean = keras.layers.Lambda(lambda x: tf.math.reduce_mean(x, axis=(2)))(tdense_)

相关问题 更多 >