如何通过一个SeparableConv2D传递多个图像？

# this code raises an error ValueError: Input 0 of layer <name> is # incompatible with the layer: expected ndim=4, found ndim=5. model = Sequential([ Input(shape=(16, 128, 128, 1)), SeparableConv2D(32, 3), GlobalAvgPool3D(), ])

2条回答

网友

1楼 · 编辑于 2024-09-30 20:33:01

只是为了确保您的输入形状应该是4D。正如SeparableConv2D期望输入形状为4D的张量：(batch_size, channels, rows, cols)如果数据格式为“首先通道”或形状为4D的张量：(batch_size, rows, cols, channels)如果数据格式为“最后通道”

工作示例

import tensorflow as tf
input_shape = (16, 128, 128, 1)
x = tf.random.normal(input_shape)
y = tf.keras.layers.SeparableConv2D( 2, 3, activation='relu', input_shape=input_shape[1:])(x)
print(y.shape)

输出

(16, 126, 126, 2)

网友

2楼 · 编辑于 2024-09-30 20:33:01

如果你有像视频帧这样的图像，它们以某种方式相互连接您可以在conv2d上循环并逐个传递它们

检查来自my git repo的视频动作识别示例

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        num_classes = 1
        dr_rate= 0.2
        pretrained = True
        rnn_hidden_size = 30
        rnn_num_layers = 2
        #get a pretrained vgg19 model ( taking only the cnn layers and fine tun them)
        baseModel = models.vgg19(pretrained=pretrained).features  
        i = 0
        for child in baseModel.children():
            if i < 28:
                for param in child.parameters():
                    param.requires_grad = False
            else:
                for param in child.parameters():
                    param.requires_grad = True
            i +=1

        num_features = 12800
        self.baseModel = baseModel
        self.dropout= nn.Dropout(dr_rate)
        self.rnn = nn.LSTM(num_features, rnn_hidden_size, rnn_num_layers , batch_first=True)
        self.fc2 = nn.Linear(30, 256)
        self.fc3 = nn.Linear(256, num_classes)
    def forward(self, x):
        batch_size, time_steps, C, H, W = x.size()
        # reshape input  to be (batch_size * timesteps, input_size)
        x = x.contiguous().view(batch_size * time_steps, C, H, W)
        x = self.baseModel(x)
        x = x.view(x.size(0), -1)
        #make output as  ( samples, timesteps, output_size)
        x = x.contiguous().view(batch_size , time_steps , x.size(-1))
        x , (hn, cn) = self.rnn(x)
        x = F.relu(self.fc2(x[:, -1, :])) # get output of the last  lstm not full sequence
        x = self.dropout(x)
        x = self.fc3(x)
        return x

主要的想法是在这个模块中，我们将每个帧或图像分发到conv网络，然后我们对其进行重塑，甚至将其馈送到新的网络

# reshape input  to be (batch_size * timesteps, input_size)
 x = x.contiguous().view(batch_size * time_steps, C, H, W)
 # feed to the pre-trained conv model
 x = self.baseModel(x)
 # flatten the output
 x = x.view(x.size(0), -1)
 # make the new correct shape (batch_size , timesteps , output_size)
 x = x.contiguous().view(batch_size , time_steps , x.size(-1))  # this x is now ready to be entred or feed into lstm layer

相关问题更多 >

编程相关推荐

热门问题

热门文章