todo We assume the model , where we make various different assumptions on .

Given , introduce the design matrix which rows correspond to . Common practice is to add a 1 to each vector so that the first column of is all 1s, which enables us to model non-zero offsets (i.e., ).

A common loss is to minimize the “residual sum-of-squares (RSS)”, which is the distance between our predictions ( ) and the true values , i.e.,

The resulting is the least squares estimate.

Once you’ve ft a model, you should run linear regression diagnostics.