如何从曲线中删除异常值

2024-09-28 22:15:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一些订阅的生存期数据,显示了一周的帐单百分比与创建订阅的第一周的帐单百分比。因此,如果创建了1000个sub,并且在“计费的第5周”中,还有0.3166个sub(即计费的那些sub中的316.6个)。你知道吗

我想清除“坏数据”。有时,由于技术问题,我们几个星期都不付账,然后就赶上付账单。当我把它用于其他预测时,这会把我的保留曲线搞砸。你知道吗

如何清除一些与技术错误相关的保留率百分比,以便在我的模型中不使用它们。你知道吗

我在想一些逻辑。如果保留率的当前值(%X列和X列之间的差值-1)是与前3列的平均值相差0.20的绝对增量值,则删除该值。i、 e.在下一个表中,我想去掉第1行第4列和第5列中的值。我认为我的规则也会扼杀第6周%的账单(尽管相应表格中0.1990的实际利率看起来不错)。你知道吗

我试着这样做一个lambda,但是得到了一个错误“TypeError:'float'object is not subscriptable”,如果我能工作的话,我可以在列中循环。你知道吗

df['%BilledWeek5']= df['%BilledWeek5'].apply(lambda x: x if (((x['%BilledWeek4'] + x['%BilledWeek3'] + x['%BilledWeek2'])/3)/x-1).abs() <0.2 else '')

也许我完全走错了路。可能还有一些统计函数可以使用。你知道吗

import pandas as pd

subscriptionlifetime = [{'Country':'DE','Product':'Cable','Created Week':'cWeek1','Billed Week1':0.2430,'Billed Week2':0.2240,'Billed Week3':0.2207,'Billed Week4':0.0934,
'Billed Week5':0.3166,'Billed Week6':0.1990,'Billed Week7':0.1889,'Billed Week8':0.1816},
         {'Country':'DE','Product':'Cable','Created Week':'cWeek2','Billed Week1':0.2411,'Billed Week2':0.2407,
         'Billed Week3':0.2234,'Billed Week4':0.2222,'Billed Week5':0.0917,'Billed Week6':0.3206,'Billed Week7':0.2006,'Billed Week8':0.1909},
         {'Country':'AU','Product':'Satelite','Created Week':'cWeek1','Billed Week1':0.3019,'Billed Week2':0.2884,
         'Billed Week3':0.2884,'Billed Week4':0.2682,'Billed Week5':0.2657,'Billed Week6':0.1076,'Billed Week7':0.3856,'Billed Week8':0.2403},
         {'Country':'AU','Product':'Satelite','Created Week':'cWeek2','Billed Week1':0.2864,'Billed Week2':0.2748,
         'Billed Week3':0.2623,'Billed Week4':0.2453,'Billed Week5':0.2420,'Billed Week6':0.0963,'Billed Week7':0.3539,'Billed Week8':0.2216}]

df = pd.DataFrame(subscriptionlifetime)

df = df[['Country','Product','Created Week', 'Billed Week1', 'Billed Week2', 'Billed Week3', 'Billed Week4', 'Billed Week5','Billed Week6' , 'Billed Week7', 'Billed Week8']]         

print(df)

  Country   Product Created Week  Billed Week1  Billed Week2  Billed Week3  \
0      DE     Cable       cWeek1        0.2430        0.2240        0.2207   
1      DE     Cable       cWeek2        0.2411        0.2407        0.2234   
2      AU  Satelite       cWeek1        0.3019        0.2884        0.2884   
3      AU  Satelite       cWeek2        0.2864        0.2748        0.2623   

   Billed Week4  Billed Week5  Billed Week6  Billed Week7  Billed Week8  
0        0.0934        0.3166        0.1990        0.1889        0.1816  
1        0.2222        0.0917        0.3206        0.2006        0.1909  
2        0.2682        0.2657        0.1076        0.3856        0.2403  
3        0.2453        0.2420        0.0963        0.3539        0.2216  


for x in range(2,8):

    df['%BilledWeek'+str(x)] = df['Billed Week'+str(x)]/df['Billed Week'+str(x-1)]
    print (x)

print(df)


  Country   Product Created Week  Billed Week1  Billed Week2  Billed Week3  \
0      DE     Cable       cWeek1        0.2430        0.2240        0.2207   
1      DE     Cable       cWeek2        0.2411        0.2407        0.2234   
2      AU  Satelite       cWeek1        0.3019        0.2884        0.2884   
3      AU  Satelite       cWeek2        0.2864        0.2748        0.2623   

   Billed Week4  Billed Week5  Billed Week6  Billed Week7  Billed Week8  \
0        0.0934        0.3166        0.1990        0.1889        0.1816   
1        0.2222        0.0917        0.3206        0.2006        0.1909   
2        0.2682        0.2657        0.1076        0.3856        0.2403   
3        0.2453        0.2420        0.0963        0.3539        0.2216   

   %BilledWeek2  %BilledWeek3  %BilledWeek4  %BilledWeek5  %BilledWeek6  \
0      0.921811      0.985268      0.423199      3.389722      0.628553   
1      0.998341      0.928126      0.994628      0.412691      3.496183   
2      0.955283      1.000000      0.929958      0.990679      0.404968   
3      0.959497      0.954512      0.935189      0.986547      0.397934   

   %BilledWeek7  
0      0.949246  
1      0.625702  
2      3.583643  
3      3.674974  

Tags: dfdeproductcountryweekcreatedweek2week1