Resources
Notation
variable | dimension | name |
---|---|---|
\(\mathbf{y}_i\) | \(\mathbb{R}^{(M \times 1)}\) | ith predictor |
\(\mathbf{x}_i\) | \(\mathbb{R}^{(P \times 1)}\) | ith state |
\(\alpha_i\) | \(\mathbb{R}^{(1 \times 1)}\) | ith sample weight |
\(\mathbf{w}\) | \(\mathbb{R}^{(P \times M)}\) | weights |
\(\mathbf{Y}\) | \(\mathbb{R}^{(N \times M)}\) | all predictors |
\(\mathbf{X}\) | \(\mathbb{R}^{(N \times P)}\) | all states |
\(\boldsymbol{\alpha}\) | \(\mathbb{R}^{(N \times N)}\) | identity matrix with columns entries being data point weights |
Weighted Gaussian Linear regression
The log-likelihood of dataset with \(N\) weighted samples \(\mathcal{D} = \{\mathbf{x}_i, \mathbf{y}_i, \alpha_i \}_{i=1:N}\) which is modeled by a linear gaussian function is given by:
where \(p(\cdot)\) is a Gaussian probability density function:
\[p(\mathbf{y}_i|\mathbf{x}_i;\boldsymbol{\theta}) = \frac{1}{(2\pi)^{D/2}|\boldsymbol{\Sigma}|^{1/2}} \exp\Bigg(-\frac{1}{2} (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \Bigg)\]with parameters \(\boldsymbol{\theta} \triangleq \{\mathbf{w}, \boldsymbol{\Sigma}\}\).
Expansion of the log-likelihood
First without considering the weights \(\alpha\) we simplify \(L(\mathcal{D};\boldsymbol{\theta})\)
\[\begin{align} &=\sum_{i=1}^N \log 1 - \log( (2\pi)^{D/2}|\boldsymbol{\Sigma}|^{1/2}) - \frac{1}{2} (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \\ &=\sum_{i=1}^N - \frac{1}{2}\log((2\pi)^{D}|\boldsymbol{\Sigma}|) - \frac{1}{2} (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \\ &=\sum_{i=1}^N - \frac{1}{2}\log((2\pi)^{D}) - \frac{1}{2}\log |\boldsymbol{\Sigma}| - \frac{1}{2} (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)\\ &= \sum_{i=1}^N - \frac{D}{2}\log(2\pi) - \frac{1}{2}\log |\boldsymbol{\Sigma}| - \frac{1}{2} (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \\ &= -\frac{N\,D}{2}\log(2\pi) - \frac{N}{2}\log |\boldsymbol{\Sigma}| - \frac{1}{2} \sum_{i=1}^N (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \end{align}\]Simplifying with the weights:
\[\begin{align} &= \sum_{i=1}^N - \frac{D}{2}\log(2\pi)\alpha_i - \frac{1}{2}\log |\boldsymbol{\Sigma}|\alpha_i - \frac{1}{2} (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \alpha_i\\ &= - \frac{D}{2}\log(2\pi)\left(\sum_{i=1}^N \alpha_i\right) - \frac{1}{2}\log |\boldsymbol{\Sigma}|\left(\sum_{i=1}^N \alpha_i\right) - \frac{1}{2} \sum_{i=1}^N (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \alpha_i \end{align}\]1D Maximum likelihood
Given that we are in the 1D case \(\boldsymbol{\Sigma} = \sigma\), \(D=1\)
\[\begin{align} &L(\mathcal{D};\boldsymbol{\theta}) = \\ &-\frac{1}{2}\log(2\pi)\left(\sum_{i=1}^N \alpha_i\right) - \frac{1}{2}\log \sigma^2 \left(\sum_{i=1}^N \alpha_i\right) - \frac{1}{2\, \sigma^2} \sum_{i=1}^N (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \alpha_i\\ &= - \frac{1}{2}\log(2\pi \sigma^2)\left(\sum_{i=1}^N \alpha_i\right) - \frac{1}{2\, \sigma^2} \sum_{i=1}^N (y - \mathbf{w}^{T}\mathbf{x}_i)^2\alpha_i \end{align}\]Set the derivaties with respect to the parameters to zero, \(\frac{\partial L}{\partial \mathbf{w}} = 0\) and \(\frac{\partial L}{\partial \sigma^2} = 0\), and solve for \(\mathbf{w}\) and \(\sigma^2\):