回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我在sklearn中使用泊松回归函数拟合模型。然而,代码似乎对我的模型施加了一种未要求的正则化,即使我已将正则化参数设置为0。任何关于如何阻止这种情况的想法都将不胜感激</p>
<p>我有一个时变预测器x,它由一个基集来描述,以生成预测矩阵x。我使用x来预测(稀疏)计数向量Y。我的代码如下所示:</p>
<pre><code>from sklearn.linear_model import PoissonRegressor
PR = PoissonRegressor(alpha = 0.0)
PR.fit(X,Y)
</code></pre>
<p>然而,尽管<code>alpha = 0</code>,也就是说,(应该)关闭正则化,但结果拟合似乎是平滑/正则化的</p>
<p>为了测试这一点,我复制并粘贴了sklearn的GeneralizedLinearRegressor函数中使用的最小化函数到我自己的代码中,并用alpha=0对其进行了测试。为了避免出现一大块代码,我将把它放在问题的底部。在回归器对象之外使用解算器给出的答案与<code>PR.fit()</code>不同,但与使用statsmodels获得的解几乎相同。这里说明了这种差异</p>
<p><a href="https://i.stack.imgur.com/ybrh5.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/ybrh5.png" alt="enter image description here"/></a></p>
<p>重要的是,sklearn行比我的代码/statsmodels生成的版本平滑得多,这表明sklearn-poisson回归函数中仍然存在某种正则化</p>
<p>那么,我的问题是:
如何禁用此(不需要的)正则化</p>
<p>谢谢</p>
<p>我的代码:</p>
<pre><code>from scipy.optimize import minimize
from sklearn._loss.glm_distribution import PoissonDistribution
from sklearn.utils.optimize import _check_optimize_result
from sklearn.linear_model._glm.link import LogLink
alpha = 0
def _safe_lin_pred(X, coef):
"""Compute the linear predictor taking care if intercept is present."""
if coef.size == X.shape[1] + 1:
return X @ coef[1:] + coef[0]
else:
return X @ coef
def _y_pred_deviance_derivative(coef, X, y, family,link):
"""Compute y_pred and the derivative of the deviance w.r.t coef."""
lin_pred = _safe_lin_pred(X, coef)
y_pred = link.inverse(lin_pred)
d1 = link.inverse_derivative(lin_pred)
temp = d1 * family.deviance_derivative(y, y_pred)
if coef.size == X.shape[1] + 1:
devp = np.concatenate(([temp.sum()], temp @ X))
else:
devp = temp @ X # same as X.T @ temp
return y_pred, devp
# Same as PoissonRegressor, but with regularization removed.
def func(coef, X, y,alpha,family,link):
y_pred, devp = _y_pred_deviance_derivative(
coef, X, y,family,link
)
coef_scaled = alpha * coef
dev = family.deviance(y, y_pred)
obj = 0.5 * dev + 0.5 * (coef @ coef_scaled)
objp = 0.5 * devp
objp += coef_scaled
return obj, objp
args = (X, Y ,alpha,PoissonDistribution(),LogLink())
coef0 = np.ones(X.shape[1])
opt_res = minimize(
func, coef0, method=method, jac=True,
options={
"maxiter": self.max_iter,
"iprint": (self.verbose > 0) - 1,
"gtol": self.tol,
"ftol": 1e3*np.finfo(float).eps,
},
args=args)
</code></pre>