Feature Engineering

Posted 2020-10-08 付小同

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Feature Engineering相关的知识，希望对你有一定的参考价值。

1. remove skew

Why:

Many model built on the hypothsis that the input data are distributed as a ‘Normal Distribution‘(Gaussian Distribution). So if the input data is more like Normal Distribution, the results are better.

Methods:

remove skewnewss: log function.

2. standardization

Why:

Different data have different scale, to avoid give to high weight to those data with large scale.

Methods:

min-max = (data - min) / (max - min)
z-score = (data - mean) / (sd), sd standard deviation

3. manual remove

Why:

sometimes we know that some columns are meanless, so we just remove it manually.

Method:

columns like "ID", "timestamp"

4. remove columns with too many nulls

Why:

if a feature has too many nulls, it‘s not reliable.

Method:

count the percentage of nulls.

5. drop outlier

Why:

outliers are the special cases for a set of data. they don‘t represent the common experience. so they will not contribute to a model, on the contrary, they will be harmful for our models.

Methods:

remove data that >= an extreme value, or <= an extreme value.

6. to be continued

以上是关于Feature Engineering的主要内容，如果未能解决你的问题，请参考以下文章

arcengine 怎么从sde数据库中查询得到feature对象

更改了文件夹结构 - 现在缺少 Specflow .feature 文件（仅看到 .feature.cs）

TensorFlow2 特征列 feature_column

Feature分支

ABP框架详解Feature

解决报错SAXNotRecognizedException: Feature ‘http://jav