Bayesian linear regression

Posted 2021-03-11 eliker

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Bayesian linear regression相关的知识，希望对你有一定的参考价值。

Let $S={(x^{(i)}, y^{(i)})}_{i=1}^m$ be a training set of i.i.d. examples from unknow distribution. The standard probabilistic interpretation of linear regression states that

$$ y^{(i)} = heta^T x^{(i)} + varepsilon^{(i)}, qquad i=1, dots, m $$

where the $varepsilon^{(i)}$ are i.i.d. “noise” variables with independent $mathcal N(0, sigma^2)$ distributions. It follows that $y^{(i)} - heta^T x^{(i)} sim mathcal N(0, sigma^2) $, or equivalently,

$$ P(y^{(i)} | x^{(i)} = frac{1}{sqrt{2pi} sigma} ext{exp}(-frac{(y^{(i)} - heta^T x^{(i)})^2}{2sigma^2}) $$

In Bayesian linear regression, we assume that a prior distribution over parameters is also given; a typical choice, for instance, is $ heta sim mathcal N(0, au^2 I)$. Using Bayes’s rule, we obtain the parameter posterior,

egin{equation} p( heta, | S) =frac{p( heta) p(S | heta)}{int_{ heta’} p( heta’) p(S | heta’) d heta’} = frac{p( heta) prod_{i=1}^{m} p(y^{(i)} | x^{(i)}, heta)}{int_{ heta’} p( heta’) prod_{i=1}^{m} p(y^{(i)} | x^{(i)}, heta’) d heta’} label{ppost}end{equation}

Assuming the same noise model on testing points as on training points, the “output” of Bayesian linear regression on a new test point $x_*$ is not just a single guess “$y_*$”, but rather an entire probability distribution over possible outputs, knows as the posterior predictive distribution:

egin{equation}p(y_* | x_* , S) = int_{ heta} p(y_* | x_* , heta ) p( heta | S) d heta label{postd}end{equation}

For many types of models, the integrals in ( ef{ppost}) and ( ef{postd}), are difficult to compute, and hence, we often resort to approximations, such as maximum a posteriori MAP estimation. MAP1, MAP2. Also you can see Regularization and Model selection.

$$hat{ heta} = ext{arg max}_{ heta} p( heta, | S) = ext{arg max}_{ heta} prod_{i=1}^{m} p(y^{(i)} | x^{(i)}, heta)$$

In the case of Bayesian linear regression, however, the integrals actually are tractable! In particular, for Bayesian linear regression, one can show that (in 2.1.1 The standard linear model: http://www.gaussianprocess.org/gpml/)

$$ heta | S sim mathcal N(frac{1}{sigma^2}A^{-1}X^Ty, A^{-1}) $$

$$ y_* | x_*, S sim mathcal (frac{1}{sigma^2}x_*^TA^{-1}X^Ty, x_*^TA^{-1}x_* + sigma^2) $$

where $A = 1/sigma^2 X^TX + 1/ au^2 I$. the derivation of these formulas is somewhat involved. Nonetheless, from these equations, we get at least a flavor of what Bayesian models are all about: the posterior distribution over the test output $y_*$ for a test input $x_*$ is a gaussian distribution – this distribution reflects the uncertainty in our predictions $y_* = heta^Tx_* + varepsilon_*$ arising from both the randomness in $varepsilon_*$ and the uncertainty in our choice of parameter $ heta$. In contrast, classical probabilistic linear regression models estimate parameters $ heta$ directly from the training data but provide no estimate of how reliable these learned parameters may be.

以上是关于Bayesian linear regression的主要内容，如果未能解决你的问题，请参考以下文章

Bayesian generalized linear model (GLM) | 贝叶斯广义线性回归实例

Bayesian statistics

五朴素贝叶斯（Naive Bayesian）

Bayesian statistics

python bayesian_ab_test.py