
2024-10-16 17:27:39 发布

您现在位置:Python中文网/ 问答频道 /正文





enter image description here






Tags: 数据模型名称内容目标站点森林分类





Azure机器学习团队有an article on how to choose algorithms,即使你不使用AzureML,它也会有所帮助。从那篇文章中:

How large is your training data? If your training set is small, and you're going to train a supervised classifier, then machine learning theory says you should stick to a classifier with high bias/low variance, such as Naive Bayes. These have an advantage over low bias/high variance classifiers such as kNN since the latter tends to overfit. But low bias/high variance classifiers are more appropriate if you have a larger training set because they have a smaller asymptotic error - in these cases a high bias classifier isn't powerful enough to provide an accurate model. There are theoretical and empirical results that indicate that Naive Bayes does well in such circumstances. But note that having better data and good features usually can give you a greater advantage than having a better algorithm. Also, if you have a very large dataset classification performance may not be affected as much by the algorithm you use, so in that case it's better to choose your algorithm based on such things as its scalability, speed, or ease of use.

Do you need to train incrementally or in a batched mode? If you have a lot of data, or your data is updated frequently, you probably want to use Bayesian algorithms that update well. Both neural nets and SVMs need to work on the training data in batch mode.

Is your data exclusively categorical or exclusively numeric or a mixture of both kinds? Bayesian works best with categorical/binomial data. Decision trees can't predict numerical values.

Do you or your audience need to understand how the classifier works? Bayesian or decision trees are more easily explained. It's much harder to see or explain how neural networks and SVMs classify data.

How fast does your classification need to be generated? Decision trees can be slow when the tree is complex. SVMs, on the other hand, classify more quickly since they only need to determine which side of the "line" your data is on.

How much complexity does the problem present or require? Neural nets and SVMs can handle complex non-linear classification.

现在,关于你关于“fyi:always 0的基线预测非常高,为92.8%”的评论:有异常检测算法-这意味着分类是高度不平衡的,其中一个分类是很少发生的“异常”,就像信用卡欺诈检测一样(真正的欺诈只占整个数据集的一小部分)。在Azure机器学习中,我们使用单类支持向量机(SVM)和基于PCA的异常检测算法。希望有帮助!在

相关问题 更多 >