数据标准化vs规范化vs Robus

我的问题是：

我说标准化也会受到极端值的负面影响，这是对的吗？如果没有，为什么要根据提供的结果？

我真的看不出健壮的Scaler是如何改进数据的，因为我在生成的数据集中仍然有extreme值？有什么简单完整的解释吗？

p.S

我想象一个场景，我想为神经网络准备数据集，我担心消失梯度问题。不过，我的问题仍然是一般性的。

2条回答

网友
1楼 · 编辑于 2024-05-13 15:19:31

Am I right to say that also Standardization gets affected negatively by the extreme values as well?
事实上，你是；scikit学习docs他们自己清楚地警告这种情况：
However, when data contains outliers, StandardScaler can often be mislead. In such cases, it is better to use a scaler that is robust against outliers.
或多或少，对于MinMaxScaler也是如此。
I really can't see how the Robust Scaler improved the data because I still have extreme values in the resulted data set? Any simple -complete interpretation?
健壮并不意味着免疫，或不受攻击，缩放的目的是不以“删除”异常值和极值-这是一个单独的任务，有自己的方法；这在relevant scikit-learn docs中再次明确提到：
RobustScaler
[...] Note that the outliers themselves are still present in the transformed data. If a separate outlier clipping is desirable, a non-linear transformation is required (see below).
其中“see below”指的是^{}和^{}。

网友
2楼 · 编辑于 2024-05-13 15:19:31

它们都不是健壮的，因为缩放会处理异常值，并将它们放在一个有限的尺度上，即不会出现极值。
您可以考虑以下选项：
在缩放之前剪切（例如，在5%到95%之间）序列/数组
如果剪裁不理想，则采用平方根或对数之类的转换
显然，添加另一列“is clipped”/“对数剪裁量”将减少信息丢失。

我的问题是：

相关问题更多 >

编程相关推荐

热门问题

热门文章