我有一些订阅的生存期数据,显示了一周的帐单百分比与创建订阅的第一周的帐单百分比。因此,如果创建了1000个sub,并且在“计费的第5周”中,还有0.3166个sub(即计费的那些sub中的316.6个)。你知道吗
我想清除“坏数据”。有时,由于技术问题,我们几个星期都不付账,然后就赶上付账单。当我把它用于其他预测时,这会把我的保留曲线搞砸。你知道吗
如何清除一些与技术错误相关的保留率百分比,以便在我的模型中不使用它们。你知道吗
我在想一些逻辑。如果保留率的当前值(%X列和X列之间的差值-1)是与前3列的平均值相差0.20的绝对增量值,则删除该值。i、 e.在下一个表中,我想去掉第1行第4列和第5列中的值。我认为我的规则也会扼杀第6周%的账单(尽管相应表格中0.1990的实际利率看起来不错)。你知道吗
我试着这样做一个lambda,但是得到了一个错误“TypeError:'float'object is not subscriptable”,如果我能工作的话,我可以在列中循环。你知道吗
df['%BilledWeek5']= df['%BilledWeek5'].apply(lambda x: x if (((x['%BilledWeek4'] + x['%BilledWeek3'] + x['%BilledWeek2'])/3)/x-1).abs() <0.2 else '')
也许我完全走错了路。可能还有一些统计函数可以使用。你知道吗
import pandas as pd
subscriptionlifetime = [{'Country':'DE','Product':'Cable','Created Week':'cWeek1','Billed Week1':0.2430,'Billed Week2':0.2240,'Billed Week3':0.2207,'Billed Week4':0.0934,
'Billed Week5':0.3166,'Billed Week6':0.1990,'Billed Week7':0.1889,'Billed Week8':0.1816},
{'Country':'DE','Product':'Cable','Created Week':'cWeek2','Billed Week1':0.2411,'Billed Week2':0.2407,
'Billed Week3':0.2234,'Billed Week4':0.2222,'Billed Week5':0.0917,'Billed Week6':0.3206,'Billed Week7':0.2006,'Billed Week8':0.1909},
{'Country':'AU','Product':'Satelite','Created Week':'cWeek1','Billed Week1':0.3019,'Billed Week2':0.2884,
'Billed Week3':0.2884,'Billed Week4':0.2682,'Billed Week5':0.2657,'Billed Week6':0.1076,'Billed Week7':0.3856,'Billed Week8':0.2403},
{'Country':'AU','Product':'Satelite','Created Week':'cWeek2','Billed Week1':0.2864,'Billed Week2':0.2748,
'Billed Week3':0.2623,'Billed Week4':0.2453,'Billed Week5':0.2420,'Billed Week6':0.0963,'Billed Week7':0.3539,'Billed Week8':0.2216}]
df = pd.DataFrame(subscriptionlifetime)
df = df[['Country','Product','Created Week', 'Billed Week1', 'Billed Week2', 'Billed Week3', 'Billed Week4', 'Billed Week5','Billed Week6' , 'Billed Week7', 'Billed Week8']]
print(df)
Country Product Created Week Billed Week1 Billed Week2 Billed Week3 \
0 DE Cable cWeek1 0.2430 0.2240 0.2207
1 DE Cable cWeek2 0.2411 0.2407 0.2234
2 AU Satelite cWeek1 0.3019 0.2884 0.2884
3 AU Satelite cWeek2 0.2864 0.2748 0.2623
Billed Week4 Billed Week5 Billed Week6 Billed Week7 Billed Week8
0 0.0934 0.3166 0.1990 0.1889 0.1816
1 0.2222 0.0917 0.3206 0.2006 0.1909
2 0.2682 0.2657 0.1076 0.3856 0.2403
3 0.2453 0.2420 0.0963 0.3539 0.2216
for x in range(2,8):
df['%BilledWeek'+str(x)] = df['Billed Week'+str(x)]/df['Billed Week'+str(x-1)]
print (x)
print(df)
Country Product Created Week Billed Week1 Billed Week2 Billed Week3 \
0 DE Cable cWeek1 0.2430 0.2240 0.2207
1 DE Cable cWeek2 0.2411 0.2407 0.2234
2 AU Satelite cWeek1 0.3019 0.2884 0.2884
3 AU Satelite cWeek2 0.2864 0.2748 0.2623
Billed Week4 Billed Week5 Billed Week6 Billed Week7 Billed Week8 \
0 0.0934 0.3166 0.1990 0.1889 0.1816
1 0.2222 0.0917 0.3206 0.2006 0.1909
2 0.2682 0.2657 0.1076 0.3856 0.2403
3 0.2453 0.2420 0.0963 0.3539 0.2216
%BilledWeek2 %BilledWeek3 %BilledWeek4 %BilledWeek5 %BilledWeek6 \
0 0.921811 0.985268 0.423199 3.389722 0.628553
1 0.998341 0.928126 0.994628 0.412691 3.496183
2 0.955283 1.000000 0.929958 0.990679 0.404968
3 0.959497 0.954512 0.935189 0.986547 0.397934
%BilledWeek7
0 0.949246
1 0.625702
2 3.583643
3 3.674974
目前没有回答
相关问题 更多 >
编程相关推荐