在Python中优化团队评级

2024-10-01 00:24:53 发布

您现在位置:Python中文网/ 问答频道 /正文

一般来说,目标是通过优化导致收敛的参数来最大化负对数似然(或最小化正对数似然)。在上下文中,这些参数是攻击等级、防御等级、标准差和一般的主场优势。前3个参数将是向量(长度是比赛中的球队数量),它们是特定于球队的,主场优势将只是一个标量

import numpy as np
import pandas as pd
import scipy.optimize


# Reads the game data
game = pd.read_csv('Games.csv') 
numGames = len(game) # Number of Games
homeadv = 1.1 # Home Advantage 

上面读取的原始数据帧的前两行如下所示:

   Game Home ID Away ID Home Points Away Points
    1     1        2        62         59
    2     3        4        81         82

整理团队的ID和初始参数猜测

id_list = sorted(pd.unique(pd.concat([game['HomeID'], game['AwayID']], axis=0)))

# Attack Parameters, Defence Parameters, Standard Deviation Parameters, and Home Advantage set to an arbitrary value
attackratings = [5 for id in id_list]
defenceratings = [5 for id in id_list]
stdevratings = [2 for id in id_list]
homeadv = 1.1 # Home Advantage for the Team playing at home

# Put into a tuple for the scipy.optimize.minimize
init_params = tuple(attackratings + defenceratings + stdevratings + [homeadv])

每个参数的列表-\u h表示Home和\u a表示Away

attack_h = []
defence_a = []
st_dev_h = []
st_dev_a = []
attack_a = []
defence_h = []

for i in range(0,len(game)):
    x = attackratings[id_list.index(game.HomeID[i])]
    attack_h.append(x)
    x = defenceratings[id_list.index(game.AwayID[i])]
    defence_a.append(x)
    x = stdevratings[id_list.index(game.HomeID[i])]
    st_dev_h.append(x)
    x = stdevratings[id_list.index(game.AwayID[i])]
    st_dev_a.append(x)
    # Home Def and Away Att
    x = attackratings[id_list.index(game.AwayID[i])]
    attack_a.append(x)
    x = defenceratings[id_list.index(game.HomeID[i])]
    defence_h.append(x)

game['attack_h'] = attack_h
game['defence_a'] = defence_a
game['attack_a'] = attack_a
game['defence_h'] = defence_h
game['st_dev_h'] = st_dev_h
game['st_dev_a'] = st_dev_a

在给定参数的情况下,计算每个团队获得这些分数的概率:

game['exp_home'] = scipy.stats.norm.pdf(game.HomePts,game.attack_h*game.defence_a*homeadv,game.st_dev_h*game.st_dev_a)
game['exp_away'] = scipy.stats.norm.pdf(game.AwayPts,game.attack_a*game.defence_h,game.st_dev_h*game.st_dev_a)

下一步是找到每个匹配的对数似然,它只是exp_homeexp_away的乘积

game['loglik'] = np.log(game['exp_home']*game['exp_away'])

所以game['loglik']的总和是需要最小化的,但我不知道该怎么做。你知道吗

到目前为止,我的努力都失败得很惨,但下面的代码基本上就是我所追求的损失函数:

def logsum(params,game,id_list):
    lt = -np.sum(game.xy)
    return lt

W = scipy.optimize.minimize(logsum, x0=init_params, args=(game, id_list))

我对Python还比较陌生,但是如果有任何帮助我都会非常感激!如果上面的解释不清楚,请回答所有问题。你知道吗


Tags: devidgamehomefor参数indexscipy