Bias and variance decomposition:
\[ \begin{aligned} E[(y-\hat{f} (x))^2] &= E[(f(x) + \epsilon-\hat{f}(x))^2] \\ &= E[(f(x)-\hat{f}(x))^2] + E[\epsilon^2] + 2E[(f(x)-\hat{f}(x))]\epsilon \\ &= E[(f(x)-\hat{f}(x))^2] + \sigma^2 \end{aligned} \]
where \(f\) is the true function (unknown), \(\epsilon\) is the inreducible noise.
\[ \begin{aligned} E[(f(x)-\hat{f}(x))^2] &= E[(f(x) - E[\hat{f}(x)] - \hat{f}(x) + E[\hat{f}(x)])^2] \\ &= E[((f(x) - E[\hat{f}(x)]) - (\hat{f}(x) - E[\hat{f}(x)]))^2]\\ &= E[(f(x) - E[\hat{f}(x)])^2] - E[(\hat{f}(x)-E[\hat{f}(x)])^2] - 2E[((f(x) - E[\hat{f}(x)])(\hat{f}(x) - E[\hat{f}(x)])]\\ &= bias[\hat f(x)]^2 + Var(\hat f(x)) - 2((f(x) - E[\hat{f}(x)])E[(\hat{f}(x) - E[\hat{f}(x)])] \\ &= bias[\hat f(x)]^2 + Var(\hat f(x)) \\ \end{aligned} \]
where the bias is defined as \(bias[\hat f(x)] = E[\hat f(x)] - f(x)\) and variance is defined as \(Var(\hat f(x)) = E[(\hat f(x)-E[\hat f(x)])^2]\) Therefore, \[E[(y-\hat{f} (x))^2] = bias[\hat f(x)]^2 + Var(\hat f(x)) + \sigma^2\]