如何控制特征的数量[机器学习]？ - 问答 - Python中文网

如何控制特征的数量[机器学习]？

2024-09-28 05:24:55 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我写这个机器学习代码（分类）来区分两个类。我开始有一个功能，以捕捉我所有的图像。你知道吗

例如：（注：1和0用于标记） A类=[（4295046.0，1），（4998220.0，1），（4565017.0，1），（4078291.0，1），（4350411.0，1），（4434050.0，1），（4201831.0，1），（4203570.0，1），（4197025.0，1），（4110781.0，1），（4080568.0，1），（4276499.0，1），（4363551.0，1），（4241573.0，1），（4455070.0，1），（5682823.0，1），（5572122.0，1），（5382890.0，1），（5217487.0，1），（4714908.0，1），（4697137.0，1），（4057898.0,1）、（4143981.0,1）、（3899129.0,1）、（3830584.0,1）、（3557377.0,1）、（3125518.0,1）、（3197039.0,1）、（3109404.0,1）、（3024219.0,1）、（3066759.0,1）、（272633.0,1）、（3507626.0,1）等）

B类=[（7179088.0,0），（7144249.0,0），（6806806.0,0），（5080876.0,0），（5170390.0,0），（5694876.0,0），（6210510.0,0），（5376014.0,0），（6472171.0,0），（7112956.0,0），（7356507.0,0），（9180030.0,0），（9183460.0），（9212517.0,0），（9055663.0），（9053709.0,0），（9103067.0,0），（88899903.0,0），（8328604.0），（8475442.0,0），（8499221.0,0），(8752169.0, 0), (8779133.0, 0), (8756789.0, 0), (8990732.0, 0), (9027381.0, 0), (9090035.0, 0), (9343846.0, 0), (9518609.0, 0), (9435149.0, 0), (9365842.0, 0), (9395256.0, 0), (4381880.0, 0), (4749338.0, 0), (5296143.0, 0), (5478942.0, 0), (5610865.0, 0), (5514997.0, 0), (5381010.0, 0), (5090416.0, 0), (4663958.0, 0), (4804526.0, 0), (4743107.0, 0）、（4898914.0,0）、（5018503.0,0）、（5778240.0,0）、（5741893.0,0）、（4632926.0,0）、（5208486.0,0）、（5633403.0,0）、（5699410.0,0）、（5748260.0,0）、（5869260.0,0）等]

/data is A and B combined

x = [[each[0]] for each in data]
y = [[each[1]] for each in data]
print (len(x), len(y))

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, 
random_state=42)
print (len(x_train), len(x_test))
print (len(y_train), len(y_test))

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
clf.fit(x_train, y_train)

问题：

要更改什么以添加其他功能？添加功能时A和B应该是什么样子？我是否要更改这一行

clf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)

使用两个功能时？你知道吗

我猜：

A类=[（4295046.0，secons功能，1），（4998220.0，secons功能，1），（4565017.0，secons功能，1），（4078291.0，secons功能，1），（4350411.0，secons功能，1），（4434050.0，1），…] 是这样吗？有更好的办法吗？你知道吗

Tags： in test 功能 for data len train random

2条回答

网友

1楼 · 编辑于 2024-09-28 05:24:55

随机森林的概念是，你有很多简单的模型，你平均。这意味着无论你有多少特征，你的树不应该太深。如果你有很多特征，并且使用了很多树，你可以尝试增加深度，但是一般来说，对于随机森林，树应该是浅的。实验和尝试！你知道吗

例如：

https://medium.com/all-things-ai/in-depth-parameter-tuning-for-random-forest-d67bb7e920d

在这个实验中有+900个数据点和9个特征。他们测试了最大深度在1到32之间的值，从结果来看，我认为5左右是最好的。但这可能会因数据集和相关特征的不同而有所不同。你知道吗

网友

2楼 · 编辑于 2024-09-28 05:24:55

这个模型不需要明确的特性数量。
如果类始终是数据中每个元组的最后一个元素，则可以执行以下操作：

x = [[each[:-1]] for each in data]
y = [[each[-1]] for each in data]

从那里继续做同样的事情。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章