在NFL周值中分配游戏

2024-06-25 22:43:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图为NFL的每一场比赛分配一个发生的那一周的数值。 以2008赛季为例,9月4日至10日之间的所有比赛都在第1周进行

i = 0
week = 1
start_date = df2008['date'].iloc[0]
end_date = df2008['date'].iloc[-1]
week_range = pd.interval_range(start=start_date, end=end_date, freq='7D', closed='left')
for row in df2008['date']:
   row = row.date()
   if row in week_range[i]:
        df2008['week'] = week
   else:
       week += 1

不过,这将把所有游戏更新到第1周

           date  week
1601 2008-09-04     1
1602 2008-09-07     1
1603 2008-09-07     1
1604 2008-09-07     1
1605 2008-09-07     1
...         ...   ...
1863 2009-01-11     1
1864 2009-01-11     1
1865 2009-01-18     1
1866 2009-01-18     1
1867 2009-02-01     1

我尝试使用print语句进行调试,这些是我的结果。”“射程内”是指在第一周进行的游戏,并按预期返回。你知道吗

In Range
In Range
In Range
In Range
In Range
In Range
In Range
In Range
In Range
In Range
In Range
In Range
In Range
In Range
In Range
In Range
Not In Range
Not In Range
Not In Range
Not In Range
Not In Range
Not In Range

df\U样品:

    display(df2008[['date', 'home', 'away', 'week']])

    date    home    away    week
1601    2008-09-04  Giants  Redskins    1
1602    2008-09-07  Falcons Lions   1
1603    2008-09-07  Bills   Seahawks    1
1604    2008-09-07  Titans  Jaguars 1
1605    2008-09-07  Dolphins    Jets    1
... ... ... ... ...
1863    2009-01-11  Giants  Eagles  1
1864    2009-01-11  Steelers    Chargers    1
1865    2009-01-18  Cardinals   Eagles  1
1866    2009-01-18  Steelers    Ravens  1
1867    2009-02-01  Cardinals   Steelers    1

有人能指出我哪里出错了吗?你知道吗


Tags: in游戏homedatenotrangestartend
2条回答

考虑避免任何循环,并对datetime字段使用pandas.Series.dt.week,该字段返回一年中的一周。然后,从第一周减去。然而,皱纹发生时,考虑到新的一年,所以必须有条件地处理通过增加年底的差异,然后周的新年。幸运的是,星期从星期一开始(所以星期四到星期天保持相同的星期数)。你知道吗

first_week = pd.Series(pd.to_datetime(['2008-09-04'])).dt.week.values

# FIND LAST SUNDAY OF YEAR (NOT NECESSARILY DEC 31)
end_year_week = pd.Series(pd.to_datetime(['2008-12-28'])).dt.week.values   

new_year_week = pd.Series(pd.to_datetime(['2009-01-01'])).dt.week.values

# CONDITIONALLY ASSIGN    
df2008['week'] = np.where(df2008['date'] < '2009-01-01', 
                          (df2008['date'].dt.week - first_week) + 1,
                          ((end_year_week - first_week) + ((df2008['date'].dt.week - new_year_week) + 1))
                          )

用随机种子数据(包括新年日期)演示。将替换为OP的reproducible sample。你知道吗

数据

import numpy as np
import pandas as pd

### DATA BUILD
np.random.seed(120619)
df2008 = pd.DataFrame({'group': np.random.choice(['sas', 'stata', 'spss', 'python', 'r', 'julia'], 500),
                       'int': np.random.randint(1, 10, 500),
                       'num': np.random.randn(500),
                       'char': [''.join(np.random.choice(list('ABC123'), 3)) for _ in range(500)],
                       'bool': np.random.choice([True, False], 500),
                       'date': np.random.choice(pd.date_range('2008-09-04', '2009-01-06'), 500)
                      })

计算

first_week = pd.Series(pd.to_datetime(['2008-09-04'])).dt.week.values

end_year_week = pd.Series(pd.to_datetime(['2008-12-28'])).dt.week.values

new_year_week = pd.Series(pd.to_datetime(['2009-01-01'])).dt.week.values

df2008['week'] = np.where(df2008['date'] < '2008-12-28', 
                          (df2008['date'].dt.week - first_week) + 1,
                          ((end_year_week - first_week) + ((df2008['date'].dt.week - new_year_week) + 1))
                          )

df2008 = df2008.sort_values('date').reset_index(drop=True)

print(df2008.head(10))
#     group  int       num char   bool       date  week
# 0     sas    2  0.099927  A2C  False 2008-09-04     1
# 1  python    3  0.241393  2CB  False 2008-09-04     1
# 2  python    8  0.516716  ABC  False 2008-09-04     1
# 3    spss    2  0.974715  3CB  False 2008-09-04     1
# 4   stata    9 -1.582096  CAA   True 2008-09-04     1
# 5     sas    3  0.070347  1BB  False 2008-09-04     1
# 6       r    5 -0.419936  1CA   True 2008-09-05     1
# 7  python    6  0.628749  1AB   True 2008-09-05     1
# 8  python    3  0.713695  CA1  False 2008-09-05     1
# 9  python    1 -0.686137  3AA  False 2008-09-05     1

print(df2008.tail(10))    
#       group  int       num char   bool       date  week
# 490    spss    5 -0.548257  3CC   True 2009-01-04    17
# 491   julia    8 -0.176858  AA2  False 2009-01-05    18
# 492   julia    5 -1.422237  A1B   True 2009-01-05    18
# 493   stata    2 -1.710138  BB2   True 2009-01-05    18
# 494  python    4 -0.285249  1B1   True 2009-01-05    18
# 495    spss    3  0.918428  C23   True 2009-01-06    18
# 496       r    5 -1.347936  1AC  False 2009-01-06    18
# 497   stata    3  0.883093  1C3  False 2009-01-06    18
# 498  python    9  0.448237  12A   True 2009-01-06    18
# 499    spss    3  1.459097  2A1  False 2009-01-06    18

OP最初的问题是:“有人能指出我错在哪里吗?”, 所以-尽管正如Parfait指出的那样,使用pandas.Series.dt.week是一个很好的解决方案-为了帮助他找到答案,我遵循了OP最初的代码逻辑,并进行了一些修正:

import pandas as pd

i = 0
week = 1

df2008 = pd.DataFrame({"date": [pd.Timestamp("2008-09-04"), pd.Timestamp("2008-09-07"), pd.Timestamp("2008-09-07"), pd.Timestamp("2008-09-07"), pd.Timestamp("2008-09-07"), pd.Timestamp("2009-01-11"), pd.Timestamp("2009-01-11"), pd.Timestamp("2009-01-18"), pd.Timestamp("2009-01-18"), pd.Timestamp("2009-02-01")],
"home": ["Giants", "Falcon", "Bills", "Titans", "Dolphins", "Giants", "Steelers", "Cardinals", "Steelers", "Cardinals"],
"away": ["Falcon", "Bills", "Titans", "Dolphins", "Giants", "Steelers", "Cardinals", "Steelers", "Cardinals", "Ravens"]
})

i = 0
week = 1
start_date = df2008['date'].iloc[0]
#end_date = df2008['date'].iloc[-1]
end_date = pd.Timestamp("2009-03-01")

week_range = pd.interval_range(start=start_date, end=end_date, freq='7D', closed='left')

df2008['week'] = None
for i in range(len(df2008['date'])):
    rd = df2008.loc[i, 'date'].date()

    while True:
        if week == len(week_range):
            break
        if rd in week_range[week - 1]:
            df2008.loc[i, 'week'] = week
            break
        else:
            week += 1

print(df2008)

输出:

        date       home       away  week
0 2008-09-04     Giants     Falcon     1
1 2008-09-07     Falcon      Bills     1
2 2008-09-07      Bills     Titans     1
3 2008-09-07     Titans   Dolphins     1
4 2008-09-07   Dolphins     Giants     1
5 2009-01-11     Giants   Steelers    19
6 2009-01-11   Steelers  Cardinals    19
7 2009-01-18  Cardinals   Steelers    20
8 2009-01-18   Steelers  Cardinals    20
9 2009-02-01  Cardinals     Ravens    22

相关问题 更多 >