In speech processing and elsewhere, a frequently appearing task is to make a prediction of an unknown vector *y* from available observation vectors *x*. Specifically, we want to have an estimate

\hat y = f(x) |

such that

\hat y \approx y. |

In particular, we will focus on *linear estimates* where

\hat y=f(x):=x^T A, |

and where *A* is a matrix of parameters.

Suppose we want to minimise the squared error of our estimate on average. The estimation error is* *

e=y-\hat y |

and the squared error is the *L _{2}*-norm of the error, that is,

\left\|e\right\|^2 = e^T e |

and its mean can be written as

E\left[\left\|e\right\|^2\right]. |