擅长:python、mysql、java
<p>熊猫解决方案的速度快了几个数量级:</p>
<pre><code>def hampel(vals_orig, k=7, t0=3):
'''
vals: pandas series of values from which to remove outliers
k: size of window (including the sample; 7 is equal to 3 on either side of value)
'''
#Make copy so original not edited
vals=vals_orig.copy()
#Hampel Filter
L= 1.4826
rolling_median=vals.rolling(k).median()
difference=np.abs(rolling_median-vals)
median_abs_deviation=difference.rolling(k).median()
threshold= t0 *L * median_abs_deviation
outlier_idx=difference>threshold
vals[outlier_idx]=np.nan
return(vals)
</code></pre>
<p>计时这给予11毫秒对15秒;巨大的改善。在</p>
<p>我在<a href="https://ocefpaf.github.io/python4oceanographers/blog/2015/03/16/outlier_detection/" rel="noreferrer">this post.</a>中找到了一个类似过滤器的解决方案</p>