Bayesian linear regression
Posted eliker
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Bayesian linear regression相关的知识,希望对你有一定的参考价值。
Let $S={(x^{(i)}, y^{(i)})}_{i=1}^m$ be a training set of i.i.d. examples from unknow distribution. The standard probabilistic interpretation of linear regression states that
$$ y^{(i)} = heta^T x^{(i)} + varepsilon^{(i)}, qquad i=1, dots, m $$
where the $varepsilon^{(i)}$ are i.i.d. “noise” variables with independent $mathcal N(0, sigma^2)$ distributions. It follows that $y^{(i)} - heta^T x^{(i)} sim mathcal N(0, sigma^2) $, or equivalently,
$$ P(y^{(i)} | x^{(i)} = frac{1}{sqrt{2pi} sigma} ext{exp}(-frac{(y^{(i)} - heta^T x^{(i)})^2}{2sigma^2}) $$
In Bayesian linear regression, we assume that a prior distribution over parameters is also given; a typical choice, for instance, is $ heta sim mathcal N(0, au^2 I)$. Using Bayes’s rule, we obtain the parameter posterior,
egin{equation} p( heta, | S) =frac{p( heta) p(S | heta)}{int_{ heta’} p( heta’) p(S | heta’) d heta’} = frac{p( heta) prod_{i=1}^{m} p(y^{(i)} | x^{(i)}, heta)}{int_{ heta’} p( heta’) prod_{i=1}^{m} p(y^{(i)} | x^{(i)}, heta’) d heta’} label{ppost}end{equation}
Assuming the same noise model on testing points as on training points, the “output” of Bayesian linear regression on a new test point $x_*$ is not just a single guess “$y_*$”, but rather an entire probability distribution over possible outputs, knows as the posterior predictive distribution:
egin{equation}p(y_* | x_* , S) = int_{ heta} p(y_* | x_* , heta ) p( heta | S) d heta label{postd}end{equation}
For many types of models, the integrals in ( ef{ppost}) and ( ef{postd}), are difficult to compute, and hence, we often resort to approximations, such as maximum a posteriori MAP estimation. MAP1, MAP2. Also you can see Regularization and Model selection.
$$hat{ heta} = ext{arg max}_{ heta} p( heta, | S) = ext{arg max}_{ heta} prod_{i=1}^{m} p(y^{(i)} | x^{(i)}, heta)$$
In the case of Bayesian linear regression, however, the integrals actually are tractable! In particular, for Bayesian linear regression, one can show that (in 2.1.1 The standard linear model: http://www.gaussianprocess.org/gpml/)
$$ heta | S sim mathcal N(frac{1}{sigma^2}A^{-1}X^Ty, A^{-1}) $$
$$ y_* | x_*, S sim mathcal (frac{1}{sigma^2}x_*^TA^{-1}X^Ty, x_*^TA^{-1}x_* + sigma^2) $$
where $A = 1/sigma^2 X^TX + 1/ au^2 I$. the derivation of these formulas is somewhat involved. Nonetheless, from these equations, we get at least a flavor of what Bayesian models are all about: the posterior distribution over the test output $y_*$ for a test input $x_*$ is a gaussian distribution – this distribution reflects the uncertainty in our predictions $y_* = heta^Tx_* + varepsilon_*$ arising from both the randomness in $varepsilon_*$ and the uncertainty in our choice of parameter $ heta$. In contrast, classical probabilistic linear regression models estimate parameters $ heta$ directly from the training data but provide no estimate of how reliable these learned parameters may be.
以上是关于Bayesian linear regression的主要内容,如果未能解决你的问题,请参考以下文章