Python,在尝试计算线性回归的误差时出现奇怪的值

2024-09-30 20:35:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我是python的新手,我被指派创建自己的算法来解决线性回归问题,而不使用任何导入。问题是,当我尝试我的程序来计算错误时,它给出了一个奇怪的值(我将它与microsoftexcel的计算结果进行比较)。以下是我的程序:

x=[1.,1.,2.,2.,2.,2.,2.,2.,2.,3.,3.,3.,3.,3.,3.,3.,3.,3.,3.,3.,3.,3.,3.,3.,3.,3.,3.,3.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,4.,5.,5.,5.,5.,5.,5.,5.,5.,6.]
y=[67.,62.,109.,83.,91.,88.,123.,100.,109.,137.,131.,122.,122.,118.,115.,131.,143.,142.,122.,140.,150.,140.,150,150.,140.,150.,130.,130.,138.,135.,146.,146.,145.,145.,144.,140.,150.,152.,157.,155.,153.,154.,158.,162.,161.,162.,165.,171.,162.,169.,167.,150.,170.,140.,140.,150.,150.,150.,160.,150.,150.,150.,150.,140.,160.,170.,160.,160.,170.,171.,188.,170.,150.,150.,160.,160.,180.,170.]
sumx = 0
sumxdoubled = 0
sumxsquare = 0
sumxy = 0
meanx = 0
sumy = 0
sumerror = 0
n= 78

for i in range(78):
   sumxy = sumxy + (x[i] * y[i])
print("Total (x.y) : ",sumxy)

for i in range(78):
   sumx = sumx + x[i]
print("Total x : ",sumx)

for i in range(78):
   sumxsquare = sumxsquare + (x[i] ** 2)
print("Total (x^2) : ",sumxsquare)

sumxdoubled = sumx ** 2
print("(Total x)^2 : ",sumxdoubled)

meanx = sumx / n
print("Average x : ",meanx)

for i in range(78):
   sumy = sumy + y[i]
print("Total y : ",sumy)

meany = sumy / n
print("Average y : ",meany)

a1 = ((n*sumxy) - (sumx * sumy)) / ((n*sumxsquare) - sumxdoubled)
print("a1 = ",a1)

a0 = meany - a1 * meanx
print("a0 = ",a0)
for i in range (78):
   sumerror = sumerror + (y[i] - a0 - (a1 * x[i]))
print("Total error = ",sumerror)

输出为:

Total (x.y) :  42117.0
Total x :  283.0
Total (x^2) :  1093.0
(Total x)^2 :  80089.0
Average x :  3.628205128205128
Total y :  11201.0
Average y :  143.60256410256412
a1 =  22.312294288480153
a0 =  62.64898354307843
Total error =  -7.673861546209082e-13

使用microsoft excel尝试相同数据时的错误值为-14.25

为什么python给出的值甚至不接近excel值-14.25?我猜不出程序出了什么问题,因为我确信我使用的是正确的算法来计算错误。你知道吗


Tags: infora1rangea0totalprintaverage
1条回答
网友
1楼 · 发布于 2024-09-30 20:35:34

你的问题不在于python,而在于你的数学。 计算错误时,首先必须添加括号以确保计算正确:

sumerror = sumerror + (y[i] - a0 - (a1 * x[i])) # <  missing brackets
sumerror = sumerror + (y[i] - (a0 - (a1 * x[i])))

但是你还没有完成,你需要把这个结果除以n,然后取平方根。你知道吗

>>> sumerror = (sumerror / n)**0.5
>>> print("Total error = ",sumerror)
Total error = 12.724274483009689

因为这是一个编程论坛上的问题,我要指出的是,当你在那里的时候,你可以使用大量的内置函数来简化你自己。你知道吗

for i in range(78):
   sumxy = sumxy + (x[i] * y[i])

很糟糕,你已经硬编码了列表的长度,每次使用新列表时都需要更新。有一个内置函数len(),它将为您获取这个。在这种情况下,甚至不需要使用sum()和稍微高级一点的zip将列表连接在一起。你知道吗

# zip(x, y) returns an iterator like [(x0, y0), (x1, y1), ..., (xn, yn)]
>>> sumxy = sum(x*y for x, y in zip(x, y))
>>> print("Total (x.y) : ",sumxy)
Total (x.y) :  42117.0

相关问题 更多 >