0

I am reading a book and have difficulty in understanding the math on bias- variance tradeoff. Below is the section that I am having trouble with:

Given a set of training samples $x_1, x_2, ..., x_n$ and their targets $y_1, y_2, ..., y_n$, we want to find a regression function, $\hat{y}(x)$, which estimates the true relation $y(x)$ as correctly as possible. We measure the error of estimation, how good (or bad) the regression model is by mean squared error ($MSE$):

I can derive mean squared error with partial derivative and the concept of slope. I also understand that $MSE$ is to minimize the total error. I also understand basics statics on expected value.

Yet, I have been stuck in finding the relevant math and statistical concepts behind this formula for a week.

The question is, what are the relevant math and statistical concepts behind this formula?

For example, how

$MSE = E[(y-\hat{y} )^2]$

becomes:

$= E[(y-E[\hat{y} ] + E[\hat{y}] - \hat{y} )^2]$

Thank you! I can see that the first component after adding and subtracting E[y^] is unchanged. Then the formula operated according to $(a+b)^2 = a^2 + 2ab + b^2 $ where

$2ac = +E[2(y - E[\hat{y}])(E[\hat{y}] - \hat{y})]$

Why 2ac becomes

$2(y - E[\hat{y}])(E[\hat{y}] - E[\hat{y}]) $

Because adding and subtracting $E[\hat y]$ does not affect the result. – Emre – 2018-07-28T18:12:10.487

Thank you. However, I am still having trouble to understand the following maths. I can see that the first component after adding and subtracting E[y^] is unchanged. Then the formula operated according to (a+b)^2 = a^2 + 2ab + b^2 where 2ac = +E[2(y - E[y^])(E[y^] - y^)]. Why 2ac becomes 2(y - E[y^])(E[y^] - E[y^]) ? – Carch – 2018-07-29T04:24:09.013