Python小批量字典学习

# Fit an initial dictionary, V, as a first run dico = MiniBatchDictionaryLearning(n_components = 100, alpha = 2, n_iter = 100, transform_algorithm='omp') dl = dico.fit(patches) V = dl.components_ # Now do another 4 runs. # Note the warm restart parameter, dict_init = V. for i in range(n_runs): print("Run %s..." % i, end = "") dico = MiniBatchDictionaryLearning(n_components = 100, alpha = 2, n_iter = n_iterations, transform_algorithm='omp', dict_init = V) dl = dico.fit(patches) V = dl.components_ img_r = reconstruct_image(dico, V, patches) err = np.sqrt(np.sum((img - img_r)**2)) print("Err = %s" % err)

2条回答

网友

1楼 · 编辑于 2024-09-28 23:50:24

对fit的每次调用都会重新初始化模型，并忘记之前对fit的任何调用：这是scikit learn中所有估计器的预期行为。在

我认为在循环中使用partial_fit是正确的解决方案，但您应该在小批量上调用它（正如fit方法中所做的那样，默认的batch_size值仅为3），然后仅计算每100或1000次调用partial_fit的开销，例如：

batch_size = 3
n_epochs = 20
n_batches = X.shape[0] // batch_size
print(n_batches) # 596


n_updates = 0
for epoch in range(n_epochs):
    for i in range(n_batches):
        batch = patches[i * batch_size:(i + 1) * batch_size]
        dico.partial_fit(batch)
        n_updates += 1
        if n_updates % 100 == 0:
            img_r = reconstruct_image(dico, dico.components_, patches)
            err = np.sqrt(np.sum((img - img_r)**2))
            print("[epoch #%02d] Err = %s" % (epoch, err))

网友

2楼 · 编辑于 2024-09-28 23:50:24

我也遇到了同样的问题，最后我能使代码更快。如果它对某人仍然有用，在这里添加解决方案。关键在于，在构造MiniBatchDictionaryLearning对象时，我们需要将n_iter设置为一个较低的值（例如，1），这样对于每个partial_fit它不会在太多的时间段内运行一个批处理。在

# Construct an initial dictionary object, note partial fit will be done later inside
# the loop, here we only specify that for partial_fit it needs just to run just 1 
# epoch (n_iter=1) with batch_size=batch_size on the current batch provided 
# (otherwise by default it can run upto 1000 iterations with batch_size=3 for a 
# single partial_fit() and on each of the batches, which makes the a single run of 
# partial_fit() very slow. Since we control the epoch on our own and it restarts 
# when all the batches are done, we need not provide more than 1 iteration here. 
# This will make the code to execute fast.

batch_size = 128 # e.g.,
dico = MiniBatchDictionaryLearning(n_components = 100,
                                   alpha = 2,
                                   n_iter = 1,  # epoch per partial_fit()
                                   batch_size = batch_size,
                                   transform_algorithm='omp')

后面是@ogrisel的代码：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章