R tbats模型季节性客户标志无结果

import pandas as pa import numpy as np import pyodbc as py cnxn = py.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password) original = pa.read_sql_query('SELECT s.customer_id, s.yr, s.mnth, Case when s.usage<0 then 0 else s.usage end as usage FROM dbo.Seasonal s Join ( Select Top 2000 customer_id, SUM(usage) as usage From dbo.Seasonal where Yr!=2018 Group by customer_id ) t ON s.customer_id = t.customer_id Where yr!= 2018 Order by customer_id, yr, mnth', cnxn) grouped = original.groupby(by='customer_id') def yearmonth_to_justmonth(year, month): return year * 12 + month - 1 def fillInForOwner(group): min = group.head(1).iloc[0] max = group.tail(1).iloc[0] minMonths = yearmonth_to_justmonth(min.yr, min.mnth) maxMonths = yearmonth_to_justmonth(max.yr, max.mnth) filled_index = pa.Index(np.arange(minMonths, maxMonths, 1), name="filled_months") group['months'] = group.yr * 12 + group.mnth - 1 group = group.set_index('months') group = group.reindex(filled_index) group.customer_id = min.customer_id group.yr = group.index // 12 group.mnth = group.index % 12 + 1 group.usage = np.where(group.usage.isnull(), 0, group.usage).astype(int) return group filledIn = grouped.apply(fillInForOwner) newIndex = pa.Index(np.arange(filledIn.customer_id.count())) import rpy2 as r from rpy2.robjects.packages import importr from rpy2.robjects import r, pandas2ri, globalenv pandas2ri.activate() base = importr('base') colorspace = importr('colorspace') forecast = importr('forecast') times = importr('timeSeries') stats = importr('stats') outfile = 'results.csv' df_list = [] for customerid, dataForCustomer in filledIn.groupby(by=['customer_id']): startYear = dataForCustomer.head(1).iloc[0].yr startMonth = dataForCustomer.head(1).iloc[0].mnth endYear = dataForCustomer.tail(1).iloc[0].yr endMonth = dataForCustomer.tail(1).iloc[0].mnth customerTS = stats.ts(dataForCustomer.usage.astype(int), start=base.c(startYear,startMonth), end=base.c(endYear, endMonth), frequency=12) r.assign('customerTS', customerTS) try: seasonal = r(''' fit<-tbats(customerTS, seasonal.periods = 12, use.parallel = TRUE) fit$seasonal ''') except: seasonal = 1 df_list.append({'customer_id': customerid, 'seasonal': seasonal}) print(f' {customerid} | {seasonal} ') seasonal_output = pa.DataFrame(df_list) print(seasonal_output) seasonal_output.to_csv(outfile)

2条回答

网友

1楼 · 编辑于 2024-09-30 20:30:59

克里斯汀在这里（这是我的密码）。1实际上意味着客户不是季节性的（或者它不能提货），NULL也意味着不是季节性的。如果他们有一个季节性的使用模式（12个月的周期，这就是代码所要寻找的），它将输出[12]。你知道吗

您可以通过检查单个客户行为的图形，然后通过算法进行验证。我还喜欢在Python或R中交叉检查季节分解算法

下面是一些R代码，用于查看时间序列的分解。如果绘图中没有季节性窗口，则结果不是季节性的：

library(forecast)
myts<-ts(mydata$SENDS, start=c(2013,1),end=c(2018,2),frequency = 12)
plot(decompose(myts))

另外，你提到有一些0的问题没有填写（从你的twitter对话）我没有这个问题，但我的客户有不同的任期从2年到13年不等。不知道这里有什么问题。你知道吗

如果我还可以帮忙，请告诉我：）

网友

2楼 · 编辑于 2024-09-30 20:30:59

回过头来回答我是如何让它工作的，只是将“原始”数据帧传递到for循环中。我的数据已经有空的$0个月，所以我不需要运行这部分代码。谢谢大家的帮助

相关问题更多 >

编程相关推荐

热门问题

热门文章