机器学习--决策树
Posted mr0wang
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了机器学习--决策树相关的知识,希望对你有一定的参考价值。
目录
Decision Tree
Pre:
如下图所示,决策树包含判断模块、终止模块。其中终止模块表示已得出结论。
相较于KNN,决策树的优势在于数据的形式很容易理解。
相关介绍
- 奥卡姆剃刀原则: 切勿浪费较多的东西,去做‘用较少的的东西,同样可以做好的事情’。
- 启发法:(heuristics,策略法)是指依据有限的知识(不完整的信心)在短时间内找到解决方案的一种技术。
- ID3算法:(Iterative Dichotomiser3 迭代二叉树3代) 这个算法是建立在奥卡姆剃刀原则的基础上:越是小型的决策树越优于大的决策树(简单理论)。
Tree construction
General approach to decison trees
- Collect : Any
- Prepare : This tree-building algorithm works only on nominal values(标称型数据), so any continuous values will need to quantized(离散化).
- Analyze :Any methods, need to visually inspect the tree after it is built.
- Train : Construct a tree data structure.
- Test : Calcuate the error rate with the learned tree
use : This can be used in any supervised learning task, often, trees used to better understand the data
——《Machine Learning in Action》
Information Gain 信息增益
信息增益:在划分数据之前之后信息发生的变化.
划分数据集的大原则是:(We chose to split our dataset in a way that make our unorganized data more organized)将无序的数据变得更加有序。
1. 信息增益的计算
Claude Shannon(克劳德.香农)
Claude Shannon is considered one of the smartest people of the twentieth century. In William Poundstone’s 2005 book Fortune’s Formula, he wrote this of Claude Shannon: “There were many at Bell Labs and MIT who compared Shannon’s insight to Ein-stein’s. Others found that comparison unfair—unfair to Shannon.”
1. 信息(Information):
- 熵(entropy):
- 信息增益(Information Gain):
- 熵(entropy):
Measuring consistency in a dataset
Using resursion to construct a decision tree
Plotting tress in Matplotlib
以上是关于机器学习--决策树的主要内容,如果未能解决你的问题,请参考以下文章