wolvr的Python项目详细描述
高级机器学习实用程序库模块
例1:naivebayes
^{pr2}$
所选输出:
This demo uses a public dataset of SMS spam, which has a total of 5574 messages = 747 spam and 4827 ham (legitimate).
The goal is to use 'term frequency in message' to predict whether the message is ham (class=0) or spam (class=1).
Using a grid search and a multinomial naive bayes classifier, the best hyperparameters were found as following:
Step1: Tokenizing text: CountVectorizer(analyzer = 'word', ngram_range = (1, 1));
Step2: Transforming from occurrences to frequency: TfidfTransformer(use_idf = True).
The top 2 terms with highest probability of a message being a spam (the classification is either spam or ham):
"claim": 80.73%
"prize": 80.06%
Application example:
- Message: "URGENT! We are trying to contact U. Todays draw shows that you have won a 2000 prize GUARANTEED. Call 090 5809 4507 from a landline. Claim 3030. Valid 12hrs only."
- Probability of class=1 (spam): 98.32%
- Classification: spam
^{tb2}$
例2:k-最近邻
fromwolvrimportkNNkNN.demo("Social_Network_Ads")
所选输出:
This demo uses a public dataset of Social Network Ads, which is used to determine what audience a car company should target in its ads in order to sell a SUV on a social network website.
Using a grid search and a kNN classifier, the best hyperparameters were found as following:
Step1: scaler: StandardScaler(with_mean=True, with_std=True);
Step2: classifier: kNN_classifier(n_neighbors=8, weights='uniform', p=1.189207115002721, metric='minkowski').
例3:决策边界比较
fromwolvrimportkNNkNN.demo("Social_Network_Ads")fromwolvrimportnaive_bayesasnbnb.demo("Social_Network_Ads")fromwolvrimportSVMSVM.demo("Social_Network_Ads")