Machine Learning

Posted lijianming180

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Machine Learning相关的知识,希望对你有一定的参考价值。

假期找了几本关于机器学习的书,将一些比较重要的核心公式整体到这里。

模型描述

特征空间假设, 寻找线性系数 $ theta $ 以希望用一个线性函数逼近目标向量。

逼近的效果好坏叫做 Cost Function, 下面列出的MSE便是其中一种。

Linear Regression

梯度下降

技术图片

其中

带有正则项

  • Ridge Regression
技术图片
  • LASSO
  • Elastic Net
sklearn-线性回归
1
2
3
4
5
6
7
8
9
10
11

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X, y)
lr.intercept_, lr.coef_


from sklearn.metrics import mean_squared_error

# sgd
from sklearn.linear_model import SGDRegressor

对数线性回归 - Logistic Regression

$ sigma(t) $ 是Sigmoid函数

Logistic Regression cost function (log loss)

Logistic cost function partial derivatives

sklearn-Logistic Regression
1
2
3
4
5

from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression()
log_reg.fit(X, y)

Softmax Regression

支持向量机

Support Vector Machine

  • Decision Functions and Predictions
  • Hard Margin Classification

subject to

  • Soft Margin Classification

subject to

  • Dual Problem

subject to

LinearSVC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18


import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
< 大专栏  Machine Learningspan class="line">
iris = datasets.load_iris()
X = iris["data"][:, (2, 3)] # petal length, petal width
y = (iris["target"] == 2).astype(np.float64) # Iris-Virginica

svm_clf = Pipeline([
("scaler", StandardScaler()),
("linear_svc", LinearSVC(C=1, loss="hinge")),
])

svm_clf.fit(X, y)

Common kernels

  • Linear
  • Polynomial
  • Gaussian RBF
  • Sigmoid

从树到森林。

Decision Tree

Decision Trees

  • Gini impurity
  • Entropy
  • CART cost function for regression

where

DecisionTreeClassifier
1
2
3
4
5
6
7
8
9
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

iris = load_iris()
X = iris.data[:, 2:] # petal length and width
y = iris.target

tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)

Random Forests

RF 在我看来是 Ensemble Learning (集成学习)的经典代表。

以Classifiers举例,对待同样的数据, 不同分类器可能有不同的决策结果。

Logistic Regression classifier, Random Forest classifier, K-Nearest Neighbors classifier

自然而然的, 可引入选举策略来作最终决策。

voting of classifier
1
2
3
4
5
6
7
8
9
10
11
12
13
14

from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC()

voting_clf = VotingClassifier(
estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
voting='hard')
voting_clf.fit(X_train, y_train)

Boosting

Adaboost

Gradient Boosting

效果指标

确定Model收敛的方向, 对连续和离散模型都有若干种Metrics

Classification

$F_1$ 是二者的调和平均

precision_score and recall_score
1
2

from sklearn.metrics import precision_score, recall_score

Regression

  • MSE

以上是关于Machine Learning的主要内容,如果未能解决你的问题,请参考以下文章

Machine Learning Week 3-advanced-optimization

Machine Learning学习目录

CF940F Machine Learning

Interesting Applications in Machine Learning and Computer Vision

KaggleIntermediate Machine Learning(管道+交叉验证)

KaggleIntermediate Machine Learning(管道+交叉验证)