Feature Engineering
Posted 付小同
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Feature Engineering相关的知识,希望对你有一定的参考价值。
1. remove skew
Why:
Many model built on the hypothsis that the input data are distributed as a ‘Normal Distribution‘(Gaussian Distribution). So if the input data is more like Normal Distribution, the results are better.
Methods:
- remove skewnewss: log function.
2. standardization
Why:
Different data have different scale, to avoid give to high weight to those data with large scale.
Methods:
- min-max = (data - min) / (max - min)
- z-score = (data - mean) / (sd), sd standard deviation
3. manual remove
Why:
sometimes we know that some columns are meanless, so we just remove it manually.
Method:
- columns like "ID", "timestamp"
4. remove columns with too many nulls
Why:
if a feature has too many nulls, it‘s not reliable.
Method:
- count the percentage of nulls.
5. drop outlier
Why:
outliers are the special cases for a set of data. they don‘t represent the common experience. so they will not contribute to a model, on the contrary, they will be harmful for our models.
Methods:
- remove data that >= an extreme value, or <= an extreme value.
6. to be continued
以上是关于Feature Engineering的主要内容,如果未能解决你的问题,请参考以下文章
arcengine 怎么从sde数据库中查询得到feature对象
更改了文件夹结构 - 现在缺少 Specflow .feature 文件(仅看到 .feature.cs)