Python小批量字典学习

2024-09-28 23:50:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我想用python中的dictionary learning实现错误跟踪,使用sklearn的MiniBatchDictionaryLearning,这样我就可以记录在迭代过程中错误是如何减少的。我有两种方法可以做到这一点,但都不管用。设置:

  • 输入数据X,numpy数组形状(n个样本,n个特征)=(298143300)。这些是形状(10,10)的补丁,由形状(642,480,3)的图像生成。在
  • 字典学习参数:列数(或原子数)=100,alpha=2,转换算法=OMP,总迭代次数=500(首先保持较小,就像测试用例一样)
  • 计算错误:在学习了字典之后,我根据所学字典对原始图像进行了重新编码。由于编码和原始的都是相同形状的numy数组(642,480,3),我现在只做元素欧几里德距离:

    错误=np.sqrt公司(np.总和(重建-原始)**2))

我用这些参数做了一个测试,完全拟合可以产生一个很好的重建,误差很小,所以很好。现在关于这两种方法:

<100>保存每次学习的错误。对于500次迭代,这给了我们5次运行,每次运行100次迭代。每次运行之后,我计算错误,然后使用当前学习的字典作为下一次运行的初始化。在

# Fit an initial dictionary, V, as a first run
dico = MiniBatchDictionaryLearning(n_components = 100,
                                   alpha = 2,
                                   n_iter = 100,
                                   transform_algorithm='omp')
dl = dico.fit(patches)
V = dl.components_

# Now do another 4 runs.
# Note the warm restart parameter, dict_init = V.
for i in range(n_runs):
    print("Run %s..." % i, end = "")
    dico = MiniBatchDictionaryLearning(n_components = 100,
                                       alpha = 2,
                                       n_iter = n_iterations,
                                       transform_algorithm='omp',
                                       dict_init = V)
    dl = dico.fit(patches)
    V = dl.components_

    img_r = reconstruct_image(dico, V, patches)
    err = np.sqrt(np.sum((img - img_r)**2))
    print("Err = %s" % err)

减少的问题并不严重。这本词典也学得不太好。在

方法2:将输入数据X切割成500批,然后使用partial_fit()方法进行部分拟合。在

^{pr2}$

问题:这似乎需要花费5000倍的时间。在

我想知道是否有办法在装配过程中找回误差?在


Tags: 方法alphaimgdictionary字典过程错误np
2条回答

fit的每次调用都会重新初始化模型,并忘记之前对fit的任何调用:这是scikit learn中所有估计器的预期行为。在

我认为在循环中使用partial_fit是正确的解决方案,但您应该在小批量上调用它(正如fit方法中所做的那样,默认的batch_size值仅为3),然后仅计算每100或1000次调用partial_fit的开销,例如:

batch_size = 3
n_epochs = 20
n_batches = X.shape[0] // batch_size
print(n_batches) # 596


n_updates = 0
for epoch in range(n_epochs):
    for i in range(n_batches):
        batch = patches[i * batch_size:(i + 1) * batch_size]
        dico.partial_fit(batch)
        n_updates += 1
        if n_updates % 100 == 0:
            img_r = reconstruct_image(dico, dico.components_, patches)
            err = np.sqrt(np.sum((img - img_r)**2))
            print("[epoch #%02d] Err = %s" % (epoch, err))

我也遇到了同样的问题,最后我能使代码更快。如果它对某人仍然有用,在这里添加解决方案。关键在于,在构造MiniBatchDictionaryLearning对象时,我们需要将n_iter设置为一个较低的值(例如,1),这样对于每个partial_fit它不会在太多的时间段内运行一个批处理。在

# Construct an initial dictionary object, note partial fit will be done later inside
# the loop, here we only specify that for partial_fit it needs just to run just 1 
# epoch (n_iter=1) with batch_size=batch_size on the current batch provided 
# (otherwise by default it can run upto 1000 iterations with batch_size=3 for a 
# single partial_fit() and on each of the batches, which makes the a single run of 
# partial_fit() very slow. Since we control the epoch on our own and it restarts 
# when all the batches are done, we need not provide more than 1 iteration here. 
# This will make the code to execute fast.

batch_size = 128 # e.g.,
dico = MiniBatchDictionaryLearning(n_components = 100,
                                   alpha = 2,
                                   n_iter = 1,  # epoch per partial_fit()
                                   batch_size = batch_size,
                                   transform_algorithm='omp')

后面是@ogrisel的代码:

^{pr2}$

enter image description here

相关问题 更多 >