在Featuretools中使用多个训练窗口计算相同的特征

In [20]: temporal_cutoffs = ft.make_temporal_cutoffs(cutoffs['customer_id'], ....: cutoffs['cutoff_time'], ....: window_size='3d', ....: num_windows=2) ....: In [21]: temporal_cutoffs Out[21]: time instance_id 0 2011-12-12 13458 1 2011-12-15 13458 2 2012-10-02 13602 3 2012-10-05 13602 4 2012-01-22 15222 5 2012-01-25 15222 In [22]: entityset = ft.demo.load_retail() In [23]: feature_tensor, feature_defs = ft.dfs(entityset=entityset, ....: target_entity='customers', ....: cutoff_time=temporal_cutoffs, ....: cutoff_time_in_index=True, ....: max_features=4) ....: In [24]: feature_tensor Out[24]: MAX(order_products.total) MIN(order_products.unit_price) STD(order_products.quantity) COUNT(order_products) customer_id time 13458.0 2011-12-12 201.960 0.3135 10.053804 394 2011-12-15 201.960 0.3135 10.053804 394 15222.0 2012-01-22 272.250 1.1880 26.832816 5 2012-01-25 272.250 1.1880 26.832816 5 13602.0 2012-10-02 49.896 1.0395 8.732068 23 2012-10-05 49.896 1.0395 8.732068 23

1条回答

网友

1楼 · 发布于 2024-09-28 23:53:51

您可以通过使用不同的training_windows对ft.calculate_feature_matrix进行两次调用，然后将生成的特征矩阵连接在一起。例如

import featuretools as ft
import pandas as pd

entityset = ft.demo.load_retail()

cutoffs = pd.DataFrame({
      'customer_name': ["Micheal Nicholson", "Krista Maddox"],
      'cutoff_time': [pd.Timestamp('2011-10-14'), pd.Timestamp('2011-08-18')]
    })

feature_defs = ft.dfs(entityset=entityset,
                      target_entity='customers',
                      agg_primitives=["sum"],
                      trans_primitives=[],
                      max_features=1,
                      features_only=True)



fm_60_days = ft.calculate_feature_matrix(entityset=entityset,
                                         features=feature_defs,
                                         cutoff_time=cutoffs,
                                         training_window="60 days")

fm_30_days = ft.calculate_feature_matrix(entityset=entityset,
                                         features=feature_defs,
                                         cutoff_time=cutoffs,
                                         training_window="30 days")

fm_60_days.merge(fm_30_days, left_index=True, right_index=True, suffixes=("__60_days", "__30_days"))

上面的代码返回这个DataFrame，其中我们使用过去60天和30天的数据来计算相同的特性。在

^{pr2}$

注意：这个例子运行在Featuretools（v0.3.1）的最新版本上，我们在其中更新了demo retail数据集，将可解释的名称作为客户id。在

编辑所需的输出格式

相关问题更多 >

编程相关推荐

热门问题

热门文章