我正在使用RetailRocket作为我的数据集。我为每个事件分配了一个值,view=1,addtocart=2,transaction=3。现在我想使用z变换来规范化这些值。不幸的是,我犯了一个错误。我的错在哪里
这是我的z变换代码:
df = df.sample(frac=1, random_state=42)
x = df[["visitorid", "itemid"]].values
#y = df["code"].values
y = df["code"].apply(lambda x: (x - x.mean()) / x.std()).values
# Assuming training on 90% of the data and validating on 10%.
train_indices = int(0.9 * df.shape[0])
x_train, x_val, y_train, y_val = (
x[:train_indices],
x[train_indices:],
y[:train_indices],
y[train_indices:],
)
print(y)
我用numpy
找到了z变换的这个公式:
X = (X - X.mean()) / X.std()
错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-2712d78bf2a4> in <module>()
2 x = df[["visitorid", "itemid"]].values
3 #y = df["code"].values
----> 4 y = df["code"].apply(lambda x: (x - x.mean()) / x.std()).values
5 # Assuming training on 90% of the data and validating on 10%.
6 train_indices = int(0.9 * df.shape[0])
1 frames
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-7-2712d78bf2a4> in <lambda>(x)
2 x = df[["visitorid", "itemid"]].values
3 #y = df["code"].values
----> 4 y = df["code"].apply(lambda x: (x - x.mean()) / x.std()).values
5 # Assuming training on 90% of the data and validating on 10%.
6 train_indices = int(0.9 * df.shape[0])
AttributeError: 'int' object has no attribute 'mean'
由于使用
apply(lambda x: ...)
,x
将只是一个值。当您尝试对单个值使用x.mean()
时,将出现错误相反,您要做的是在整个列上使用
mean
和std
。使用apply
,可以按如下方式进行:但是,如果没有
apply
,速度会更快:也许你需要这个:
我喜欢这种方法:(高性能,如果您的数据集有15000行以上)
相关问题 更多 >
编程相关推荐