回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我的数据集由可视化的二进制文件组成。这些二进制文件是<code>malware family 1</code>或<code>malware family 2</code>的一部分。这些灰度图像具有非常特殊的特征。一些示例(上部族1、下部族2):</p>
<p><a href="https://i.stack.imgur.com/3nih4.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/3nih4.png" alt="malware family 1"/></a><a href="https://i.stack.imgur.com/w3Y4k.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/w3Y4k.png" alt="malware family 1"/></a><a href="https://i.stack.imgur.com/ZW2my.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/ZW2my.png" alt="malware family 1"/></a></p>
<p><a href="https://i.stack.imgur.com/AE4nl.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/AE4nl.png" alt="malware family 2"/></a><a href="https://i.stack.imgur.com/tGzSm.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/tGzSm.png" alt="malware family 2"/></a><a href="https://i.stack.imgur.com/Opj5T.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/Opj5T.png" alt="malware family 2"/></a></p>
<p>共有2474个<code>malware family 1</code>样本和2930个<code>malware family 2</code>样本。
正如我们所看到的,同一家族样本之间的相似性非常强。CNN不应该有太多的问题来对它们进行分类</p>
<p>尽管如此,我使用的CNN只能达到大约50%的准确率(0.25%的损失)。除此之外,我还实现了<code>InceptionV3</code>模型。但该模型也只能实现50%的准确率(0.50%的损失)。这里可能有什么错误</p>
<p>加载图像:</p>
<pre><code>idx = 0
for elem in os.listdir(directory):
img = cv2.imread(full_path,cv2.IMREAD_UNCHANGED)
if idx in train_index:
dataset4_x_train.append(img)
dataset4_y_train.append(0)
else:
dataset4_x_test.append(img)
dataset4_y_test.append(0)
dataset4_x_train = np.array(dataset4_x_train)
dataset4_x_test = np.array(dataset4_x_test)
dataset4_x_train = dataset4_x_train.reshape(-1, 192, 192, 1)
dataset4_x_test = dataset4_x_test.reshape(-1, 192, 192, 1)
</code></pre>
<p>自定义CNN:</p>
<pre><code>model = Sequential()
model.add(tf.keras.layers.Conv2D(8, 5, activation="relu", input_shape=(192,192,1)))
model.add(tf.keras.layers.MaxPool2D(2))
model.add(tf.keras.layers.Conv2D(8, 3, activation="relu"))
model.add(tf.keras.layers.MaxPool2D(2))
model.add(tf.keras.layers.Conv2D(8, 3, activation="relu"))
model.add(tf.keras.layers.MaxPool2D(2))
model.add(tf.keras.layers.Conv2D(8, 3, activation="relu"))
model.add(tf.keras.layers.MaxPool2D(2))
model.add(tf.keras.layers.Conv2D(16, 3, activation="relu"))
model.add(tf.keras.layers.MaxPool2D(2))
model.add(tf.keras.layers.Conv2D(80, 4, activation="relu"))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(2, activation='softmax'))
opt = tf.keras.optimizers.Adam(lr=0.01)
model.compile(opt, loss="mse",metrics=['accuracy'])
model.fit(dataset4_x_train, dataset4_y_train, epochs=100, batch_size=50)
model.evaluate(dataset4_x_test, dataset4_y_test)
</code></pre>
<p>接收v3:</p>
<pre><code>incept_v3 = tf.keras.applications.inception_v3.InceptionV3(input_shape=(192,192,1), include_top=False, weights=None)
incept_v3.summary()
last_output = incept_v3.get_layer("mixed10").output
x = tf.keras.layers.Flatten()(last_output)
x = tf.keras.layers.Dense(2, activation="softmax")(x)
model = tf.keras.Model(incept_v3.input, x)
opt = tf.keras.optimizers.Adam(lr=0.001)
model.compile(opt, loss="mse",metrics=['accuracy'])
model.fit(dataset4_x_train, dataset4_y_train, epochs=100, batch_size=50)
model.evaluate(dataset4_x_test, dataset4_y_test)
</code></pre>