<p>考虑避免任何循环,并对datetime字段使用<code>pandas.Series.dt.week</code>,该字段返回一年中的一周。然后,从第一周减去。然而,皱纹发生时,考虑到新的一年,所以必须有条件地处理通过增加年底的差异,然后周的新年。幸运的是,星期从星期一开始(所以星期四到星期天保持相同的星期数)。你知道吗</p>
<pre><code>first_week = pd.Series(pd.to_datetime(['2008-09-04'])).dt.week.values
# FIND LAST SUNDAY OF YEAR (NOT NECESSARILY DEC 31)
end_year_week = pd.Series(pd.to_datetime(['2008-12-28'])).dt.week.values
new_year_week = pd.Series(pd.to_datetime(['2009-01-01'])).dt.week.values
# CONDITIONALLY ASSIGN
df2008['week'] = np.where(df2008['date'] < '2009-01-01',
(df2008['date'].dt.week - first_week) + 1,
((end_year_week - first_week) + ((df2008['date'].dt.week - new_year_week) + 1))
)
</code></pre>
<p>用随机种子数据(包括新年日期)演示。将替换为OP的<a href="https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples">reproducible sample</a>。你知道吗</p>
<p>数据</p>
<pre><code>import numpy as np
import pandas as pd
### DATA BUILD
np.random.seed(120619)
df2008 = pd.DataFrame({'group': np.random.choice(['sas', 'stata', 'spss', 'python', 'r', 'julia'], 500),
'int': np.random.randint(1, 10, 500),
'num': np.random.randn(500),
'char': [''.join(np.random.choice(list('ABC123'), 3)) for _ in range(500)],
'bool': np.random.choice([True, False], 500),
'date': np.random.choice(pd.date_range('2008-09-04', '2009-01-06'), 500)
})
</code></pre>
<p>计算</p>
<pre><code>first_week = pd.Series(pd.to_datetime(['2008-09-04'])).dt.week.values
end_year_week = pd.Series(pd.to_datetime(['2008-12-28'])).dt.week.values
new_year_week = pd.Series(pd.to_datetime(['2009-01-01'])).dt.week.values
df2008['week'] = np.where(df2008['date'] < '2008-12-28',
(df2008['date'].dt.week - first_week) + 1,
((end_year_week - first_week) + ((df2008['date'].dt.week - new_year_week) + 1))
)
df2008 = df2008.sort_values('date').reset_index(drop=True)
print(df2008.head(10))
# group int num char bool date week
# 0 sas 2 0.099927 A2C False 2008-09-04 1
# 1 python 3 0.241393 2CB False 2008-09-04 1
# 2 python 8 0.516716 ABC False 2008-09-04 1
# 3 spss 2 0.974715 3CB False 2008-09-04 1
# 4 stata 9 -1.582096 CAA True 2008-09-04 1
# 5 sas 3 0.070347 1BB False 2008-09-04 1
# 6 r 5 -0.419936 1CA True 2008-09-05 1
# 7 python 6 0.628749 1AB True 2008-09-05 1
# 8 python 3 0.713695 CA1 False 2008-09-05 1
# 9 python 1 -0.686137 3AA False 2008-09-05 1
print(df2008.tail(10))
# group int num char bool date week
# 490 spss 5 -0.548257 3CC True 2009-01-04 17
# 491 julia 8 -0.176858 AA2 False 2009-01-05 18
# 492 julia 5 -1.422237 A1B True 2009-01-05 18
# 493 stata 2 -1.710138 BB2 True 2009-01-05 18
# 494 python 4 -0.285249 1B1 True 2009-01-05 18
# 495 spss 3 0.918428 C23 True 2009-01-06 18
# 496 r 5 -1.347936 1AC False 2009-01-06 18
# 497 stata 3 0.883093 1C3 False 2009-01-06 18
# 498 python 9 0.448237 12A True 2009-01-06 18
# 499 spss 3 1.459097 2A1 False 2009-01-06 18
</code></pre>