回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我通过这个博客来确定我时间序列数据中的季节性客户:
<a href="https://www.kristenkehrer.com/seasonality-code" rel="nofollow noreferrer">https://www.kristenkehrer.com/seasonality-code</a></p>
<p>我的代码是无耻的几乎相同的博客,与一些小的调整,代码如下。我完全可以为2000个客户运行代码。几个小时后,在我的结果中,0个客户被标记为季节性客户。你知道吗</p>
<p>手动查看客户数据随着时间的推移,我相信我有很多季节性客户的例子,应该已经拿起。下面是我使用的数据示例。你知道吗</p>
<p>我错过了什么蠢事吗?作为python的新成员,我是不是有点想尝试一下呢?你知道吗</p>
<p>请注意,我在数据源中添加了“0个月”,但我认为再次检查该函数不会有任何影响。我还没有包括数据源凭据步骤。你知道吗</p>
<p>谢谢</p>
<p><a href="https://i.stack.imgur.com/3IVGj.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/3IVGj.png" alt="Sample Data"/></a></p>
<pre><code>import pandas as pa
import numpy as np
import pyodbc as py
cnxn = py.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password)
original = pa.read_sql_query('SELECT s.customer_id, s.yr, s.mnth, Case when s.usage<0 then 0 else s.usage end as usage FROM dbo.Seasonal s Join ( Select Top 2000 customer_id, SUM(usage) as usage From dbo.Seasonal where Yr!=2018 Group by customer_id ) t ON s.customer_id = t.customer_id Where yr!= 2018 Order by customer_id, yr, mnth', cnxn)
grouped = original.groupby(by='customer_id')
def yearmonth_to_justmonth(year, month):
return year * 12 + month - 1
def fillInForOwner(group):
min = group.head(1).iloc[0]
max = group.tail(1).iloc[0]
minMonths = yearmonth_to_justmonth(min.yr, min.mnth)
maxMonths = yearmonth_to_justmonth(max.yr, max.mnth)
filled_index = pa.Index(np.arange(minMonths, maxMonths, 1), name="filled_months")
group['months'] = group.yr * 12 + group.mnth - 1
group = group.set_index('months')
group = group.reindex(filled_index)
group.customer_id = min.customer_id
group.yr = group.index // 12
group.mnth = group.index % 12 + 1
group.usage = np.where(group.usage.isnull(), 0, group.usage).astype(int)
return group
filledIn = grouped.apply(fillInForOwner)
newIndex = pa.Index(np.arange(filledIn.customer_id.count()))
import rpy2 as r
from rpy2.robjects.packages import importr
from rpy2.robjects import r, pandas2ri, globalenv
pandas2ri.activate()
base = importr('base')
colorspace = importr('colorspace')
forecast = importr('forecast')
times = importr('timeSeries')
stats = importr('stats')
outfile = 'results.csv'
df_list = []
for customerid, dataForCustomer in filledIn.groupby(by=['customer_id']):
startYear = dataForCustomer.head(1).iloc[0].yr
startMonth = dataForCustomer.head(1).iloc[0].mnth
endYear = dataForCustomer.tail(1).iloc[0].yr
endMonth = dataForCustomer.tail(1).iloc[0].mnth
customerTS = stats.ts(dataForCustomer.usage.astype(int),
start=base.c(startYear,startMonth),
end=base.c(endYear, endMonth),
frequency=12)
r.assign('customerTS', customerTS)
try:
seasonal = r('''
fit<-tbats(customerTS, seasonal.periods = 12,
use.parallel = TRUE)
fit$seasonal
''')
except:
seasonal = 1
df_list.append({'customer_id': customerid, 'seasonal': seasonal})
print(f' {customerid} | {seasonal} ')
seasonal_output = pa.DataFrame(df_list)
print(seasonal_output)
seasonal_output.to_csv(outfile)
</code></pre>