sklearn包

Posted ironan-liu

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了sklearn包相关的知识,希望对你有一定的参考价值。

6.3 preprocessing data数据预处理
https://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling
归一化、正则化、标准化的区别
https://blog.csdn.net/tianguiyuyu/article/details/80694669
6.3.1 Standardization, or mean removal and variance scaling标准化(均值为0,方差为1)
preprocessing.scale
preprocessing.StandardScaler 在训练样本上使用后,可以同时应用到测试样本
6.3.1.1. Scaling features to a range
preprocessing.MinMaxScaler 把数据标准化到指定的最大值最小值之间
preprocessing.MaxAbsScaler 把数据标准化到指定的最大的绝对值之间
6.3.1.2. Scaling sparse data
preprocessing.MaxAbsScaler(要用transform API)
preprocessing.maxabs_scale
6.3.1.3. Scaling data with outliers
robust_scale
RobustScaler(要用transform API)
6.3.1.4. Centering kernel matrices
KernalCenterer
6.3.2. Non-linear transformation 非线性转化
6.3.2.1. Mapping to a Uniform distribution
QuantileTransformer
quantile_transform
6.3.2.2. Mapping to a Gaussian distribution
PowerTransformer
6.3.3. Normalization 归一化
Normalization is the process of scaling individual samples to have unit norm.
normalize
Normalizer(要用transform API)
6.3.4. Encoding categorical features
OrdinalEncoder(顺序编码)
OneHotEncoder
6.3.5. Discretization离散化
For instance, pre-processing with a discretizer can introduce nonlinearity to linear models.
6.3.5.1. K-bins discretization
The ‘uniform’ strategy uses constant-width bins. The ‘quantile’ strategy uses the quantiles values to have equally populated bins in each feature. The ‘kmeans’ strategy defines bins based on a k-means clustering procedure performed on each feature independently.
6.3.5.2. Feature binarization(二值化)
preprocessing.Binarizer(threshold=1.1)
6.3.6. Imputation of missing values
6.3.7. Generating polynomial features
from sklearn.preprocessing import PolynomialFeatures
PolynomialFeatures(degree=3, interaction_only=True)
6.3.8. Custom transformers(定制化转化)
convert an existing Python function into a transformer to assist in data cleaning or processing

以上是关于sklearn包的主要内容,如果未能解决你的问题,请参考以下文章

SVM 算法:不使用 sklearn 包(从头开始编码)

Python数据挖掘-使用sklearn包

在 python 中调整 ElasticNet 参数 sklearn 包

如何从 sklearn 包中安装“校准”功能?

安装sklearn

sklearn包