Sklearn包含的常用算法

Posted 机器学习算法与Python实战

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Sklearn包含的常用算法相关的知识,希望对你有一定的参考价值。

参考资料来自sklearn官方网站:http://scikit-learn.org/stable/

总的来说,Sklearn可实现的函数或功能可分为以下几个方面:

分类算法

回归算法

聚类算法

降维算法

文本挖掘算法

模型优化

数据预处理


分类算法

线性判别分析(LDA)

>>> from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

>>> lda = LinearDiscriminantAnalysis(solver="svd", store_covariance=True)

二次判别分析(QDA)

>>> from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

>>> qda = QuadraticDiscriminantAnalysis(store_covariances=True)

支持向量机(SVM)

>>> from sklearn import svm

>>> clf = svm.SVC()

Knn算法

>>> from sklearn import neighbors

>>> clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)

神经网络(nn)

>>> from sklearn.neural_network import MLPClassifier

>>> clf = MLPClassifier(solver='lbfgs', alpha=e-,

    ...                     hidden_layer_sizes=(, ), random_state=)

朴素贝叶斯算法(Naive Bayes)

>>> from sklearn.naive_bayes import GaussianNB

>>> gnb = GaussianNB()

决策树算法(decision tree)

>>> from sklearn import tree

>>> clf = tree.DecisionTreeClassifier()

集成算法(Ensemble methods)

1、Bagging

>>> from sklearn.ensemble import BaggingClassifier

>>> from sklearn.neighbors import KNeighborsClassifier

>>> bagging = BaggingClassifier(KNeighborsClassifier(),

...                             max_samples=0., max_features=0.)

2、随机森林(Random Forest)

>>> from sklearn.ensemble import RandomForestClassifier

>>> clf = RandomForestClassifier(n_estimators=0)

3、AdaBoost

 

>>> from sklearn.ensemble import AdaBoostClassifier

>>> clf = AdaBoostClassifier(n_estimators=00)/4、GBDT(Gradient Tree Boosting)

>>> from sklearn.ensemble import GradientBoostingClassifier

>>> clf = GradientBoostingClassifier(n_estimators=00, learning_rate=.0,

...     max_depth=, random_state=0).fit(X_train, y_train)

回归算法

最小二乘回归(OLS)

>>> from sklearn import linear_model

>>> reg = linear_model.LinearRegression()

岭回归(Ridge Regression)

>>> from sklearn import linear_model

>>> reg = linear_model.Ridge (alpha = .)

核岭回归(Kernel ridge regression)

>>> from sklearn.kernel_ridge import KernelRidge

>>> KernelRidge(kernel='rbf', alpha=0., gamma=0)

支持向量机回归(SVR)

>>> from sklearn import svm

>>> clf = svm.SVR()

套索回归(Lasso)

>>> from sklearn import linear_model

>>> reg = linear_model.Lasso(alpha = 0.)

弹性网络回归(Elastic Net)

>>> from sklearn.linear_model import ElasticNet

>>> regr = ElasticNet(random_state=0)

贝叶斯回归(Bayesian Regression)

>>> from sklearn import linear_model

>>> reg = linear_model.BayesianRidge()

逻辑回归(Logistic regression)

>>> from sklearn.linear_model import LogisticRegression

>>> clf_l_LR = LogisticRegression(C=C, penalty='l', tol=0.0)

>>> clf_l_LR = LogisticRegression(C=C, penalty='l', tol=0.0)

稳健回归(Robustness regression)

>>> from sklearn import linear_model

>>> ransac = linear_model.RANSACRegressor()

多项式回归(Polynomial regression——多项式基函数回归)

>>> from sklearn.preprocessing import PolynomialFeatures

>>> poly = PolynomialFeatures(degree=)

>>> poly.fit_transform(X)

高斯过程回归(Gaussian Process Regression)

偏最小二乘回归(PLS)

>>> from sklearn.cross_decomposition import PLSCanonical

>>> PLSCanonical(algorithm='nipals', copy=True, max_iter=00, n_components=,scale=True, tol=e-0)

 典型相关分析(CCA)

 >>> from sklearn.cross_decomposition import CCA

>>> cca = CCA(n_components=)

聚类算法

 Knn算法

 >>> from sklearn.neighbors import NearestNeighbors

>>> nbrs = NearestNeighbors(n_neighbors=, algorithm='ball_tree').fit(X)

 Kmeans算法

>>> from sklearn.cluster import KMeans

>>> kmeans = KMeans(init='k-means++', n_clusters=n_digits, n_init=0)

层次聚类(Hierarchical clustering)——支持多种距离

>>> from sklearn.cluster import AgglomerativeClustering

>>> model = AgglomerativeClustering(linkage=linkage,

connectivity=connectivity, n_clusters=n_clusters)

降维算法

主成分方法(PCA)

>>> from sklearn.decomposition import PCA

>>> pca = PCA(n_components=)

核函主成分(kernal pca)

>>> from sklearn.decomposition import KernelPCA

>>> kpca = KernelPCA(kernel="rbf", fit_inverse_transform=True, gamma=0)

因子分析(Factor Analysis)

>>> from sklearn.decomposition import FactorAnalysis

>>> fa = FactorAnalysis()

文本挖掘算法

主题生成模型(Latent Dirichlet Allocation)

>>> from sklearn.decomposition import NMF, LatentDirichletAllocation

潜在语义分析(latent semantic analysis)

模型优化

 不具体列出函数,只说明提供的功能

特征选择

随机梯度方法

交叉验证

参数调优

模型评估:支持准确率、召回率、AUC等计算,ROC,损失函数等作图

数据预处理


标准化

异常值处理

非线性转换

二值化

独热编码(one-hot)

缺失值插补:支持均值、中位数、众数、特定值插补、多重插补

衍生变量生成


以上是关于Sklearn包含的常用算法的主要内容,如果未能解决你的问题,请参考以下文章

常用聚类(K-means,DBSCAN)以及聚类的度量指标:

python常用库 - NumPy 和 sklearn入门

SKlearn | 学习总结

常用的调度算法(包含实例)|操作系统

sklearn了解一下

算法 | 使用sklearn自带的贝叶斯分类器进行文本分类和参数调优