<p>Azure机器学习团队有<a href="https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-choice/" rel="nofollow">an article on how to choose algorithms</a>,即使你不使用AzureML,它也会有所帮助。从那篇文章中:</p>
<blockquote>
<p><b>How large is your training data?</b> If your training set is small, and
you're going to train a supervised classifier, then machine learning
theory says you should stick to a classifier with high bias/low
variance, such as Naive Bayes. These have an advantage over low
bias/high variance classifiers such as kNN since the latter tends to
overfit. But low bias/high variance classifiers are more appropriate
if you have a larger training set because they have a smaller
asymptotic error - in these cases a high bias classifier isn't
powerful enough to provide an accurate model. There are theoretical
and empirical results that indicate that Naive Bayes does well in such
circumstances. But note that having better data and good features
usually can give you a greater advantage than having a better
algorithm. Also, if you have a very large dataset classification
performance may not be affected as much by the algorithm you use, so
in that case it's better to choose your algorithm based on such things
as its scalability, speed, or ease of use.</p>
<p><b>Do you need to train incrementally or in a batched mode?</b> If you have a
lot of data, or your data is updated frequently, you probably want to
use Bayesian algorithms that update well. Both neural nets and SVMs
need to work on the training data in batch mode.</p>
<p><b>Is your data exclusively categorical or exclusively numeric or a
mixture of both kinds?</b> Bayesian works best with categorical/binomial
data. Decision trees can't predict numerical values.</p>
<p><b>Do you or your audience need to understand how the classifier works?</b>
Bayesian or decision trees are more easily explained. It's much harder
to see or explain how neural networks and SVMs classify data.</p>
<p><b>How fast does your classification need to be generated?</b> Decision trees
can be slow when the tree is complex. SVMs, on the other hand,
classify more quickly since they only need to determine which side of
the "line" your data is on. </p>
<p><b>How much complexity does the problem present or require?</b> Neural nets
and SVMs can handle complex non-linear classification.</p>
</blockquote>
<p>现在,关于你关于“fyi:always 0的基线预测非常高,为92.8%”的评论:有异常检测算法-这意味着分类是高度不平衡的,其中一个分类是很少发生的“异常”,就像信用卡欺诈检测一样(真正的欺诈只占整个数据集的一小部分)。在Azure机器学习中,我们使用单类支持向量机(SVM)和基于PCA的异常检测算法。希望有帮助!在</p>