机器学习｜回归

Posted 2021-04-03 奇葩星人

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了机器学习｜回归相关的知识，希望对你有一定的参考价值。

回归和分类不同，分类是将一个实数向量集映射到一个二元（或者多元）集合，而回归是将实数向量集映射到实数集。

损失函数

这里介绍了二乘误差(SE)：。对于给定的数据集，假设为线性的情况，即，我们把它考虑成优化的问题，则目标是让平均二乘误差(MSE)最小，目标函数为：

我们要寻找：

最小二乘法

我们记，则：

我们令，则有：

仿照前面，我们进行规则化，即：

我们令，则有：

当然，对于优化问题，我们也可以使用梯度下降算法。

这一章作业的代码实现部分主要是计算梯度：

# In all the following definitions:
# x is d by n : input data
# y is 1 by n : output regression values
# th is d by 1 : weights
# th0 is 1 by 1 or scalar
def lin_reg(x, th, th0):
    return np.dot(th.T, x) + th0

def square_loss(x, y, th, th0):
    return (y - lin_reg(x, th, th0))**2

def mean_square_loss(x, y, th, th0):
    # the axis=1 and keepdims=True are important when x is a full matrix
    return np.mean(square_loss(x, y, th, th0), axis = 1, keepdims = True)

def ridge_obj(x, y, th, th0, lam):
    return np.mean(square_loss(x, y, th, th0), axis = 1, keepdims = True) + lam * np.linalg.norm(th)**2

def d_lin_reg_th(x, th, th0):
    return x

def d_square_loss_th(x, y, th, th0):
    return -2*(y - lin_reg(x, th, th0))*d_lin_reg_th(x, th, th0)

def d_mean_square_loss_th(x, y, th, th0):
    return np.mean(d_square_loss_th(x, y, th, th0),axis=1, keepdims=True)

def d_lin_reg_th0(x, th, th0):
    return np.ones((1, x.shape[1]))

def d_square_loss_th0(x, y, th, th0):
    return -2*(y - lin_reg(x, th, th0))*d_lin_reg_th0(x, th, th0)

def d_mean_square_loss_th0(x, y, th, th0):
    return np.mean(d_square_loss_th0(x, y, th, th0), axis=1, keepdims=True)

def d_ridge_obj_th(x, y, th, th0, lam):
    return d_mean_square_loss_th(x, y, th, th0) + 2 * lam * th

def d_ridge_obj_th0(x, y, th, th0, lam):
    return d_mean_square_loss_th0(x, y, th, th0)

#Concatenates the gradients with respect to theta and theta_0
def ridge_obj_grad(x, y, th, th0, lam):
    grad_th = d_ridge_obj_th(x, y, th, th0, lam)
    grad_th0 = d_ridge_obj_th0(x, y, th, th0, lam)
    return np.vstack([grad_th, grad_th0])

另外，还实现了之前所述的随机梯度下降算法：

def sgd(X, y, J, dJ, w0, step_size_fn, max_iter):
    """Implements stochastic gradient descent

    Inputs:
    X: a standard data array (d by n)
    y: a standard labels row vector (1 by n)

    J: a cost function whose input is a data point (a column vector),
    a label (1 by 1) and a weight vector w (a column vector) (in that
    order), and which returns a scalar.

    dJ: a cost function gradient (corresponding to J) whose input is a
    data point (a column vector), a label (1 by 1) and a weight vector
    w (a column vector) (also in that order), and which returns a
    column vector.

    w0: an initial value of weight vector www, which is a column
    vector.

    step_size_fn: a function that is given the (zero-indexed)
    iteration index (an integer) and returns a step size.

    max_iter: the number of iterations to perform

    Returns: a tuple (like gd):
    w: the value of the weight vector at the final step
    fs: the list of values of JJJ found during all the iterations
    ws: the list of values of www found during all the iterations

    """
    w = w0
    fs = []
    ws = []
    for i in range(max_iter):
        j = np.random.randint(0, X.shape[1])
        prev_f, prev_grad = J(X[:, j:j+1], y[:, j:j+1], w), dJ(X[:, j:j+1], y[:, j:j+1], w)
        fs.append(prev_f); ws.append(w)
        if i == max_iter-1:
            return w, fs, ws
        step = step_size_fn(i)
        w = w - step * prev_grad

把回归应用于之前计算汽车油耗的案例中，我们利用网络搜索法进行参数选择，选择最合适的参数：

for order in [1, 2, 3]:
    for feature_type in [0, 1]:
        if order != 3:
            for lam in [0,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1]:
            # for lam in range(0, 220, 20):
                auto_data_poly = hw5.make_polynomial_feature_fun(order)(auto_data[feature_type])
                score = hw5.xval_learning_alg(auto_data_poly, auto_values, lam, 10)
                # print((feature_type, order, lam, score))
                file.writelines(str(feature_type) + ',' + str(order) + ',' + str(lam) + ',' + str(score[0,0]) + '\n')
        else:
            # for lam in [0,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1]:
            for lam in range(0, 220, 20):
                auto_data_poly = hw5.make_polynomial_feature_fun(order)(auto_data[feature_type])
                score = hw5.xval_learning_alg(auto_data_poly, auto_values, lam, 10)
                file.writelines(str(feature_type) + ',' + str(order) + ',' + str(lam) + ',' + str(score[0,0]) + '\n')

作业中简单介绍了这些参数的影响。次数过低可能会造成系统误差，过高可能会过拟合；一项就是在过拟合和减小训练误差之间权衡，一般默认也是因为次数变高的时候，若要训练误差小，则往往会变得很大。

以上是关于机器学习｜回归的主要内容，如果未能解决你的问题，请参考以下文章