如何正确移除ScikitLearn的DPGMM的冗余组件？

>>> import pandas as pd >>> import numpy as np >>> import random >>> from sklearn import mixture >>> X = pd.read_csv(....) # my matrix >>> X.shape (20000, 48) >>> dpgmm3 = mixture.BayesianGaussianMixture(n_components = 20, weight_concentration_prior_type='dirichlet_process', max_iter = 1000, verbose = 2) >>> dpgmm3.fit(X) # Fitting the DPGMM model >>> labels = dpgmm3.predict(X) # Generating labels after model is fitted >>> max(labels) >>> np.unique(labels) #Number of lab els == n_components specified above array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) #Trying with a different n_components >>> dpgmm3_1 = mixture.BayesianGaussianMixture( weight_concentration_prior_type='dirichlet_process', max_iter = 1000) #not specifying n_components >>> dpgmm3_1.fit(X) >>> labels_1 = dpgmm3_1.predict(X) >>> labels_1 array([0, 0, 0, ..., 0, 0, 0]) #All were classified under the same label #Trying with n_components = 7 >>> dpgmm3_2 = mixture.BayesianGaussianMixture(n_components = 7, weight_concentration_prior_type='dirichlet_process', max_iter = 1000) >>> dpgmm3_2.fit() >>> labels_2 = dpgmm3_2.predict(X) >>> np.unique(labels_2) array([0, 1, 2, 3, 4, 5, 6]) #number of labels == n_components

2条回答

网友

1楼 · 编辑于 2024-05-06 14:23:49

目前还没有自动化的方法来实现这一点，但是您可以查看一下估计的weights_属性，并对值较小（例如低于0.01）的组件进行修剪。在

编辑：计算模型有效使用的组件数量：

model = BayesianGaussianMixture(n_components=30).fit(X)
print("active components: %d" % np.sum(model.weights_ > 0.01)

这将打印出低于所提供上限（本例中为30）的活动组件。在

编辑2：参数n_components指定模型可以使用的最大组件数。模型实际使用的组件的有效数量可以通过在fit结束时反省weigths_属性来检索。它主要取决于数据的结构和weight_concentration_prior的值（尤其是在样本数较少的情况下）。在

网友

2楼 · 编辑于 2024-05-06 14:23:49

看看[1]中描述的排斥高斯混合。它们试图与高斯函数的混合体相匹配，因为高斯函数的重叠较少，因此通常不太冗余。在

我还没有找到它的源代码。在

{1}

相关问题更多 >

编程相关推荐

热门问题

热门文章