Python feature-stuff包_程序模块 - PyPI

用于机器学习和数据科学的特征提取、处理和解释算法和功能。

feature-stuff的Python项目详细描述

feature_stuff：一个用于高级特征提取、处理和解释的python机器学习库。

Latest Release	see on pypi.org
Package Status	see on pypi.org
License	see on github
Build Status	see on travis

它是什么

^ STR 1 } $FutuxLoad 是Python软件包，提供快速灵活的算法和功能用于提取、处理和解释功能：

数字特征提取

feature_stuff.add_interactions	generic function for adding interaction features to a data frame either by passing them as a list or by passing a boosted trees model to extract the interactions from.
feature_stuff.target_encoding	target encoding of a feature column using exponential prior smoothing or mean prior smoothing
feature_stuff.cv_target_encoding	target encoding of a feature column taking cross-validation folds as input
feature_stuff.add_knn_values	creates a new feature with the K-nearest-neighbours of the values of a given feature
feature_stuff.model_features_insights_extractions.add_group_values	generic and memory efficient enrichment of features dataframe with group values

model feature insights提取

get_xgboost_interactions

takes a trained xgboost model and returns a list of interactions between features, to the order of maximum depth of all trees.

安装

最新版本的二进制安装程序可在Python package index上找到。

# or PyPI
pip install feature_stuff

源代码当前托管在GitHub上： https://github.com/hiflyin/Feature-Stuff

从源安装

在Feature-Stuff目录中（与在克隆git repo），执行：

python setup.py install

或者安装在development mode：

python setup.py develop

或者，如果希望提取所有依赖项，可以使用pip 在automatic中（选项用于在development mode中安装它）：

pip install -e .

如何使用

下面是一些函数的示例。有关完整的文档，请参阅每个函数/算法的附加api。

特色物品。添加互动

Inputs:
    df: a pandas dataframe
    model: boosted trees model (currently xgboost supported only). Can be None in which case the interactions have
    to be provided
    interactions: list in which each element is a list of features/columns in df, default: None

Output: df containing the group values added to it

从基于树的模型中提取交互并添加它们是数据集的新特性。

importfeature_stuffasfsimportpandasaspdimportxgboostasxgbdata=pd.DataFrame({"x0":[0,1,0,1],"x1":range(4),"x2":[1,0,1,0]})printdatax0x1x20001111020213130target=data.x0*data.x1+data.x2*data.x1printtarget.tolist()[0,1,2,3]model=xgb.train({'max_depth':4,"seed":123},xgb.DMatrix(data,label=target),num_boost_round=2)fs.addInteractions(data,model)# at least one of the interactions in target must have been discovered by xgboostprintdatax0x1x2inter_000010111012021031303# if we want to inspect the interactions extractedfromfeature_stuffimportmodel_features_insights_extractionsasinsightsprintinsights.get_xgboost_interactions(model)[['x0','x1']]

feature_stuff.target_编码

Inputs:
    df: a pandas dataframe containing the column for which to calculate target encoding (categ_col)
    ref_df: a pandas dataframe containing the column for which to calculate target encoding and the target (y_col)
        for example we might want to use train data as ref_df to encode test data
    categ_col: the name of the categorical column for which to calculate target encoding
    y_col: the name of the target column, or target variable to predict
    smoothing_func: the name of the function to be used for calculating the weights of the corresponding target
        value inside ref_df. Default: exponentialPriorSmoothing.
    aggr_func: the statistic used to aggregate the target variable values inside each category of the categ_col
    smoothing_prior_weight: a prior weight to put on each category. Default 1.

Output: df containing a new column called <categ_col + "_bayes_" + aggr_func> containing the encodings of categ_col

从分类特征中提取目标编码并将其作为新特征添加到数据集的示例。

import feature_stuff as fs
import pandas as pd

train_data = pd.DataFrame({"x0":[0,1,0,1]})
test_data = pd.DataFrame({"x0":[1, 0, 0, 1]})
target = range(4)

train_data = fs.target_encoding(train_data, train_data, "x0", target, smoothing_func=fs.exponentialPriorSmoothing,
                                        aggr_func="mean", smoothing_prior_weight=1)
test_data = fs.target_encoding(test_data, train_data, "x0", target, smoothing_func=fs.exponentialPriorSmoothing,
                                        aggr_func="mean", smoothing_prior_weight=1)

#train data with target encoding of "x0"
print(train_data)
   x0  y_xx  g_xx  x0_bayes_mean
0   0     0     0       1.134471
1   1     1     0       1.865529
2   0     2     0       1.134471
3   1     3     0       1.865529

#test data with target encoding of "x0"
print(test_data)
   x0  x0_bayes_mean
0   1       1.865529
1   0       1.134471
2   0       1.134471
3   1       1.865529

feature_stuff.cv_target_编码

Inputs:
    df: a pandas dataframe containing the column for which to calculate target encoding (categ_col) and the target
    categ_cols: a list or array with the the names of the categorical columns for which to calculate target encoding
    y_col: a numpy array of the target variable to predict
    cv_folds: a list with fold pairs as tuples of numpy arrays for cross-val target encoding
    smoothing_func: the name of the function to be used for calculating the weights of the corresponding target
        value inside ref_df. Default: exponentialPriorSmoothing.
    aggr_func: the statistic used to aggregate the target variable values inside each category of the categ_col
    smoothing_prior_weight: a prior weight to put on each category. Default 1.
    verbosity: 0-none, 1-high_level, 2-detailed

Output: df containing a new column called <categ_col + "_bayes_" + aggr_func> containing the encodings of categ_col

请参阅上面的feature_stuff.target_编码示例。

贡献功能内容

欢迎所有贡献、错误报告、错误修复、文档改进、增强和想法。

欢迎加入QQ群-->： 979659372

feature-stuff 0.0.dev6

feature-stuff的Python项目详细描述

feature_stuff：一个用于高级特征提取、处理和解释的python机器学习库。

它是什么

安装

从源安装

如何使用

特色物品。添加互动

feature_stuff.target_编码

feature_stuff.cv_target_编码

贡献功能内容

推荐PyPI第三方库

pypaws

django-mp-trans

ArgParseInator

django-datapurge

echarts-china-misc-pypkg

win10toast

dj-paas-env

PyCharactACDC16

githubclient

monitor-memor

plpydbapi

phoopy-http

dottorrent

pyjst

pinogy-common

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

feature-stuff 0.0.dev6

feature-stuff的Python项目详细描述

feature_stuff：一个用于高级特征提取、处理和解释的python机器学习库。

它是什么

安装

从源安装

如何使用

特色物品。添加互动

feature_stuff.target_编码

feature_stuff.cv_target_编码

贡献功能内容

推荐PyPI第三方库

pypaws

django-mp-trans

ArgParseInator

django-datapurge

echarts-china-misc-pypkg

win10toast

dj-paas-env

PyCharactACDC16

githubclient

monitor-memor

plpydbapi

phoopy-http

dottorrent

pyjst

pinogy-common

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签