要使用测试和训练数据进行分组预测,请按多列分组

2024-09-28 05:21:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我对机器学习和孤独症很陌生。我正在学习各种ml概念,请原谅我的无知

我正在从事一个项目,其中我需要根据上一个历史数据中的销售代表电话预测未来一个季度的销售代表电话。我在此提供一个样本数据框架供您参考,并请提供建议

QTR4的代表呼叫预测应基于客户号码的代表呼叫&;过去三个季度可用的产品标识

df = pd.DataFrame({"CUSTOMER_NUMBER": ["CUST1", "CUST1", "CUST1", "CUST1", "CUST1", "CUST1", "CUST1", "CUST1", "CUST1", "CUST2", "CUST2", "CUST2", "CUST2", "CUST2", "CUST2", "CUST2", "CUST3", "CUST3", "CUST3", "CUST4", "CUST4", "CUST4"],
"PRODUCT": ["PRODUCT1", "PRODUCT2", "PRODUCT3", "PRODUCT1", "PRODUCT2", "PRODUCT3", "PRODUCT1", "PRODUCT2", "PRODUCT3", "PRODUCT1", "PRODUCT2", "PRODUCT3", "PRODUCT1", "PRODUCT2", "PRODUCT3", "PRODUCT3", "PRODUCT3", "PRODUCT3", "PRODUCT3", "PRODUCT1", "PRODUCT1", "PRODUCT2"],
"REP_VISITS": ["3", "3", "3", "3", "3", "3", "4", "4", "4", "3", "2", "2", "4", "6", "8", "5", "3", "1", "3", "2", "0", "3"],
"QTR": ["QTR1", "QTR1", "QTR1", "QTR2", "QTR2", "QTR2", "QTR3", "QTR3", "QTR3", "QTR1", "QTR1", "QTR1", "QTR2", "QTR2", "QTR2", "QTR3", "QTR1", "QTR2", "QTR3", "QTR1", "QTR2", "QTR3"],
"START_DATE": ["2020-01-01", "2020-01-01", "2020-01-01", "2020-04-01", "2020-04-01", "2020-04-01", "2020-07-01", "2020-07-01", "2020-07-01", "2020-01-01", "2020-01-01", "2020-01-01", "2020-04-01",  "2020-04-01", "2020-04-01","2020-07-01", "2020-01-01", "2020-04-01", "2020-07-01", "2020-01-01", "2020-04-01", "2020-07-01"],
"END_DATE": ["2020-03-31", "2020-03-31", "2020-03-31", "2020-06-30", "2020-06-30", "2020-06-30", "2020-09-30", "2020-09-30", "2020-09-30", "2020-03-31", "2020-03-31", "2020-03-31", "2020-06-30", "2020-06-30", "2020-06-30", "2020-09-30", "2020-03-31", "2020-06-30", "2020-09-30", "2020-03-31", "2020-06-30", "2020-09-30"]})

数据框如下所示:

enter image description here

我需要找出QTR4的预测代表电话

CUST1|PRODUCT1||QTR4|
CUST1|PRODUCT2||QTR4|
CUST1|PRODUCT3||QTR4|
CUST2|PRODUCT1||QTR4|
CUST2|PRODUCT2||QTR4|
CUST2|PRODUCT3||QTR4|
CUST3|PRODUCT3||QTR4|
CUST4|PRODUCT1||QTR4|
CUST4|PRODUCT2||QTR4|

请指导我如何为客户/产品创建具有适当预测的培训数据集,以便我可以使用测试数据进行预测/评估


Tags: 数据代表电话季度product1product3product2cust1
1条回答
网友
1楼 · 发布于 2024-09-28 05:21:32

我认为您可以尝试使用客户编号和产品id作为特征,并使用逻辑回归或决策树来训练一个简单的分类器。您可以尝试对不同的客户编号和产品ID使用1-hot编码。如果您尝试这种方法,REP_访问可以是标签,功能可以是cust1、cust2、cust3、product1、product2等。 scikitlearn有这些算法的实现,它们易于使用。希望这有助于:

from sklearn.tree import DecisionTreeClassifier 
unique_cust_nos = df['CUSTOMER_NUMBER'].unique()
unique_products = df['PRODUCT'].unique()
features = []
for item in unique_cust_nos:
    features.append(item)
for item in unique_products:
    features.append(item)
for idx, item in df.iterrows():
#     make a dataframe(all_features_df) so that ['CUST1', 'CUST2', 'CUST3', 'CUST4', 'PRODUCT1', 'PRODUCT2', 'PRODUCT3'] are feature columns and rep_visits is the label
X = all_features_df[feature_cols] # Features
y = all_features_df[label] # Target variable
# Create Decision Tree classifer object
clf = DecisionTreeClassifier()
# Train Decision Tree Classifer
clf = clf.fit(X,y)
#Predict the response for test dataset
y_pred = clf.predict(X_test)

相关问题 更多 >

    热门问题