Pythonic线性回归

python #!/usr/bin/env python # -*- coding: utf-8 -*- from scipy import stats from random import randint import numpy as np def regress(y, x): reg = slope,intercept,r_value,p_value,std_err = stats.linregress(x,y) ## generate regression elements yhat = x*reg.slope + intercept ## predict y using with slope(coefficient) and intercept if __name__=="__main__": x= np.array([randint(0,1000) for n in range(0,100)]) ## generate 100 random integers between 1 and 1000 for x y= np.array([randint(0,1000) for n in range(0,100)]) ## generate 100 random integers between 1 and 1000 for y regress(y,x) ## run function using the 100 random integers for x & y

1条回答

网友

1楼 · 发布于 2024-09-28 20:51:30

这个问题属于code review而不是堆栈溢出。
用评论来解释为什么不是什么。您的代码应该足够清楚，它所做的事情不需要注释。但有时（不是在这种情况下），你需要一个评论来解释为什么你做了一些不明显的事情。
带有numpy的循环和列表理解可以被认为是code smell。首先，寻找内置函数，然后尝试寻找vectorized方法。如果做不到这一点，您可能需要求助于循环/列表理解，但一般来说，情况并非如此。例如，在本例中，numpy附带np.random.randint。
使用变量而不是将常量值传递到函数中，特别是当您使用它们两次时！如果要在x和y数组中使用1000值，请将其放入变量中。

每次调用regress时，代码都会重新拟合回归，这在计算上是浪费的。看看路^{} works in scipy。它的输出是一个函数，您可以重用它进行插值。在您的案例中，这也是一个很好的模式，您可以使用函数式编程中的一个称为closure的概念来实现它。这将更容易在代码中解释：

def regress(x, y):
    """
    A docstring is more pythonic than unneeded inline comments: https://www.python.org/dev/peps/pep-0257/
    """
    slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)   

    def regression_function(xi):  # This bit is the closure. Notice how it has access to the variables that are in the parent functions scope. The closure will remember the state of those values even after they go out of scope with the parent function. 
        return xi*slope + intercept 

    return regression_function  # Return the actual function you created above itself so that you can reuse it later.

使用它：

n = 1000
data_min = 0
data_max = 100
x = np.random.randint(data_min, data_max, (0,n))          
y = np.random.randint(data_min, data_max, (0,n))          
f_reg = regress(x,y)
xi = np.arange(1000)
yi = f_reg(xi)

另一个选择是使用sciket learn。scikitlearn使用面向对象的方法来记忆状态，而不是闭包。在本例中，您先调用fit方法来学习状态，然后再调用predict方法来重用所学习的状态。
最后，这一点非常重要，2019年使用2.7并没有什么可取之处。切换到python3。明年的支持率将下降2%。一些主要的图书馆，如熊猫馆，已经放弃了对2的支持。不要学着使用已经过时的语言！你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章