一般来说,目标是通过优化导致收敛的参数来最大化负对数似然(或最小化正对数似然)。在上下文中,这些参数是攻击等级、防御等级、标准差和一般的主场优势。前3个参数将是向量(长度是比赛中的球队数量),它们是特定于球队的,主场优势将只是一个标量
import numpy as np
import pandas as pd
import scipy.optimize
# Reads the game data
game = pd.read_csv('Games.csv')
numGames = len(game) # Number of Games
homeadv = 1.1 # Home Advantage
上面读取的原始数据帧的前两行如下所示:
Game Home ID Away ID Home Points Away Points
1 1 2 62 59
2 3 4 81 82
整理团队的ID和初始参数猜测
id_list = sorted(pd.unique(pd.concat([game['HomeID'], game['AwayID']], axis=0)))
# Attack Parameters, Defence Parameters, Standard Deviation Parameters, and Home Advantage set to an arbitrary value
attackratings = [5 for id in id_list]
defenceratings = [5 for id in id_list]
stdevratings = [2 for id in id_list]
homeadv = 1.1 # Home Advantage for the Team playing at home
# Put into a tuple for the scipy.optimize.minimize
init_params = tuple(attackratings + defenceratings + stdevratings + [homeadv])
每个参数的列表-\u h表示Home和\u a表示Away
attack_h = []
defence_a = []
st_dev_h = []
st_dev_a = []
attack_a = []
defence_h = []
for i in range(0,len(game)):
x = attackratings[id_list.index(game.HomeID[i])]
attack_h.append(x)
x = defenceratings[id_list.index(game.AwayID[i])]
defence_a.append(x)
x = stdevratings[id_list.index(game.HomeID[i])]
st_dev_h.append(x)
x = stdevratings[id_list.index(game.AwayID[i])]
st_dev_a.append(x)
# Home Def and Away Att
x = attackratings[id_list.index(game.AwayID[i])]
attack_a.append(x)
x = defenceratings[id_list.index(game.HomeID[i])]
defence_h.append(x)
game['attack_h'] = attack_h
game['defence_a'] = defence_a
game['attack_a'] = attack_a
game['defence_h'] = defence_h
game['st_dev_h'] = st_dev_h
game['st_dev_a'] = st_dev_a
在给定参数的情况下,计算每个团队获得这些分数的概率:
game['exp_home'] = scipy.stats.norm.pdf(game.HomePts,game.attack_h*game.defence_a*homeadv,game.st_dev_h*game.st_dev_a)
game['exp_away'] = scipy.stats.norm.pdf(game.AwayPts,game.attack_a*game.defence_h,game.st_dev_h*game.st_dev_a)
下一步是找到每个匹配的对数似然,它只是exp_home
和exp_away
的乘积
game['loglik'] = np.log(game['exp_home']*game['exp_away'])
所以game['loglik']
的总和是需要最小化的,但我不知道该怎么做。你知道吗
到目前为止,我的努力都失败得很惨,但下面的代码基本上就是我所追求的损失函数:
def logsum(params,game,id_list):
lt = -np.sum(game.xy)
return lt
W = scipy.optimize.minimize(logsum, x0=init_params, args=(game, id_list))
我对Python还比较陌生,但是如果有任何帮助我都会非常感激!如果上面的解释不清楚,请回答所有问题。你知道吗
目前没有回答
相关问题 更多 >
编程相关推荐