sklearn MinMaxScaler()与groupby pandas

2024-10-01 07:26:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个功能rankratings用于在不同日期从电子商务网站上抓取的不同类别下的不同产品ID

此处提供的示例数据帧:

import pandas as pd
import numpy as np
import warnings; warnings.simplefilter('ignore')
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import RobustScaler

df=pd.read_csv('https://raw.githubusercontent.com/amanaroratc/hello-world/master/testdf.csv')
df.head()

      category                bid         date  rank    ratings
0   Aftershave  ASCDBNYZ4JMSH42B    2021-10-01  61.0    462.0
1   Aftershave  ASCDBNYZ4JMSH42B    2021-10-02  69.0    462.0
2   Aftershave  ASCDBNYZ4JMSH42B    2021-10-05  89.0    463.0
3   Aftershave  ASCE3DZK2TD7G4DN    2021-10-01  309.0   3.0
4   Aftershave  ASCE3DZK2TD7G4DN    2021-10-02  319.0   3.0

我想使用sklearn中的MinMaxScaler()规范化rankratings

我试过了

cols=['rank','ratings']
features=df[cols]
scaler1=MinMaxScaler()
df_norm[['rank_norm_mm', 'ratings_norm_mm']] = scaler1.fit_transform(features)

这将对整个数据集进行规范化。 我想使用groupby对每个特定日期的每个类别执行此操作


Tags: 数据fromimportnormdfassklearn类别
2条回答

使用^{}

file = 'https://raw.githubusercontent.com/amanaroratc/hello-world/master/testdf.csv'
df=pd.read_csv(file)

from sklearn.preprocessing import MinMaxScaler

cols=['rank','ratings']

def f(x):
    scaler1=MinMaxScaler()
    x[['rank_norm_mm', 'ratings_norm_mm']] = scaler1.fit_transform(x[cols])
    return x

df = df.groupby(['category', 'date']).apply(f)

另一个解决方案:

file = 'https://raw.githubusercontent.com/amanaroratc/hello-world/master/testdf.csv'
df=pd.read_csv(file)

from sklearn.preprocessing import MinMaxScaler

scaler1=MinMaxScaler()
cols=['rank','ratings']

df= df.join(df.groupby(['category', 'date'])[cols]
               .apply(lambda x: pd.DataFrame(scaler1.fit_transform(x), index=x.index))
               .add_prefix('_norm_mm'))
        

使用groupby_apply

>>> df.groupby(['category', 'date'])[['rank', 'ratings']] \
      .apply(lambda x: pd.DataFrame(scaler1.fit_transform(x), columns=x.columns)) \
      .droplevel(2).reset_index()

     category        date  rank  ratings
0  Aftershave  2021-10-01   0.0      1.0
1  Aftershave  2021-10-01   1.0      0.0
2  Aftershave  2021-10-02   0.0      1.0
3  Aftershave  2021-10-02   1.0      0.0
4  Aftershave  2021-10-05   0.0      0.0

相关问题 更多 >