如何通过一个SeparableConv2D传递多个图像?

2024-09-30 20:33:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我有几个图像作为输入。我知道我可以使用三维卷积层,但我不想这样做。相反,我想在二维图像中找到模式

我的意思是每个图像都应该通过SeparableConv2D,如下所示:

# this code raises an error ValueError: Input 0 of layer <name> is 
# incompatible with the layer: expected ndim=4, found ndim=5.
model = Sequential([
    Input(shape=(16, 128, 128, 1)),
    SeparableConv2D(32, 3),
    GlobalAvgPool3D(),
])

我知道,我可以在这里使用Conv3D作为Conv2D

model = Sequential([
    Input(shape=(16, 128, 128, 1)),
    Conv3D(32, [1, 3, 3]),
    GlobalAvgPool3D(),
])

但我需要的正是SeparableConv2D

也许我可以通过自定义图层或其他方式来完成?我甚至想不出一个解决办法


注意:每个输入包含多个图像


Tags: 图像layerinputmodel模式codethis卷积
2条回答

只是为了确保您的输入形状应该是4D。正如SeparableConv2D期望输入形状为4D的张量:(batch_size, channels, rows, cols)如果数据格式为“首先通道”或形状为4D的张量:(batch_size, rows, cols, channels)如果数据格式为“最后通道”

工作示例

import tensorflow as tf
input_shape = (16, 128, 128, 1)
x = tf.random.normal(input_shape)
y = tf.keras.layers.SeparableConv2D( 2, 3, activation='relu', input_shape=input_shape[1:])(x)
print(y.shape)

输出

(16, 126, 126, 2)

如果你有像视频帧这样的图像,它们以某种方式相互连接 您可以在conv2d上循环并逐个传递它们

检查来自my git repo的视频动作识别示例

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        num_classes = 1
        dr_rate= 0.2
        pretrained = True
        rnn_hidden_size = 30
        rnn_num_layers = 2
        #get a pretrained vgg19 model ( taking only the cnn layers and fine tun them)
        baseModel = models.vgg19(pretrained=pretrained).features  
        i = 0
        for child in baseModel.children():
            if i < 28:
                for param in child.parameters():
                    param.requires_grad = False
            else:
                for param in child.parameters():
                    param.requires_grad = True
            i +=1

        num_features = 12800
        self.baseModel = baseModel
        self.dropout= nn.Dropout(dr_rate)
        self.rnn = nn.LSTM(num_features, rnn_hidden_size, rnn_num_layers , batch_first=True)
        self.fc2 = nn.Linear(30, 256)
        self.fc3 = nn.Linear(256, num_classes)
    def forward(self, x):
        batch_size, time_steps, C, H, W = x.size()
        # reshape input  to be (batch_size * timesteps, input_size)
        x = x.contiguous().view(batch_size * time_steps, C, H, W)
        x = self.baseModel(x)
        x = x.view(x.size(0), -1)
        #make output as  ( samples, timesteps, output_size)
        x = x.contiguous().view(batch_size , time_steps , x.size(-1))
        x , (hn, cn) = self.rnn(x)
        x = F.relu(self.fc2(x[:, -1, :])) # get output of the last  lstm not full sequence
        x = self.dropout(x)
        x = self.fc3(x)
        return x 

主要的想法是在这个模块中,我们将每个帧或图像分发到conv网络,然后我们对其进行重塑,甚至将其馈送到新的网络

# reshape input  to be (batch_size * timesteps, input_size)
 x = x.contiguous().view(batch_size * time_steps, C, H, W)
 # feed to the pre-trained conv model
 x = self.baseModel(x)
 # flatten the output
 x = x.view(x.size(0), -1)
 # make the new correct shape (batch_size , timesteps , output_size)
 x = x.contiguous().view(batch_size , time_steps , x.size(-1))  # this x is now ready to be entred or feed into lstm layer

相关问题 更多 >