MS-251›Linear Regression Model Using Matrices

Probability and StatisticsTopic 35 of 36

Linear Regression Model Using Matrices

11 minread

1,937words

Intermediatelevel

Linear Regression Model Using Matrices

In linear regression, the relationship between the dependent variable ( $Y$ ) and one or more independent variables ( $X$ ) can be expressed in a matrix form for more efficient computation, especially when there are multiple predictors. This matrix representation is particularly useful when working with multiple regression (more than one independent variable) and provides a more general and compact way to solve for the regression coefficients.

Let’s break down how linear regression can be represented and solved using matrices.

1. The Linear Regression Model

In multiple linear regression, the model is defined as:

Y = X \beta + \epsilon

Where:

$Y$ is the $n \times 1$ vector of observed values (dependent variable),
$X$ $X$ is the $n \times p$ $n \times p$ matrix of independent variables (with $n$ $n$ data points and $p$ $p$ predictors),
- The matrix $X$ typically includes a column of ones for the intercept term.
$\beta$ is the $p \times 1$ vector of regression coefficients (including the intercept and slopes),
$\epsilon$ is the $n \times 1$ vector of residuals (errors), representing the difference between the observed and predicted values.

For simple linear regression, the model is expressed as:

Y = X \beta + \epsilon

Where:

$Y = \begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{bmatrix}$ is a column vector of the dependent variable values.
$X = \begin{bmatrix} 1 & X_1 \\ 1 & X_2 \\ \vdots & 1 & X_n \end{bmatrix}$ is a matrix where the first column is a column of ones (to account for the intercept) and the second column contains the values of the independent variable $X$ .
$\beta = \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix}$ is the vector of parameters (intercept $\beta_0$ and slope $\beta_1$ ).

2. The Ordinary Least Squares (OLS) Estimation

In ordinary least squares (OLS) regression, the objective is to find the vector $\beta$ that minimizes the sum of squared residuals:

S(\beta) = \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 = (Y - X \beta)^T (Y - X \beta)

Where:

$Y$ is the vector of observed values,
$\hat{Y} = X \beta$ is the predicted values,
$(Y - X \beta)$ is the residual vector.

To minimize the sum of squared residuals, we take the derivative of $S(\beta)$ with respect to $\beta$ and set it equal to zero:

\frac{\partial}{\partial \beta} \left( (Y - X \beta)^T (Y - X \beta) \right) = 0

Expanding this expression:

\frac{\partial}{\partial \beta} \left( Y^T Y - Y^T X \beta - \beta^T X^T Y + \beta^T X^T X \beta \right) = 0

This simplifies to:

-2 X^T Y + 2 X^T X \beta = 0

Solving for $\beta$ , we get the normal equation:

\beta = (X^T X)^{-1} X^T Y

Where:

$X^T$ is the transpose of the matrix $X$ ,
$X^T X$ is the $p \times p$ matrix of the Gram matrix (a measure of the correlation between the predictors),
$X^T Y$ is a $p \times 1$ vector of cross-products of the independent variables with the dependent variable,
$(X^T X)^{-1}$ is the inverse of the matrix $X^T X$ .

The resulting vector $\hat{\beta} = \begin{bmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{bmatrix}$ contains the estimates of the regression coefficients, including the intercept and the slope(s).

3. Step-by-Step Matrix Solution

Step 1: Set up the matrices

For a simple linear regression example with $n = 4$ data points, assume:

Y = \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \\ Y_4 \end{bmatrix}, \quad X = \begin{bmatrix} 1 & X_1 \\ 1 & X_2 \\ 1 & X_3 \\ 1 & X_4 \end{bmatrix}

This matrix $X$ includes a column of ones to account for the intercept term. Suppose the independent variable values $X_1, X_2, X_3, X_4$ are given.

Step 2: Calculate $X^T X$ and $X^T Y$

The next step is to calculate the matrix products $X^T X$ and $X^T Y$ .

$X^T X$ is the $p \times p$ matrix:

X^T X = \begin{bmatrix} n & \sum X_i \\ \sum X_i & \sum X_i^2 \end{bmatrix}

$X^T Y$ is the $p \times 1$ matrix (vector):

X^T Y = \begin{bmatrix} \sum Y_i \\ \sum X_i Y_i \end{bmatrix}

Step 3: Solve for $\beta$

Once we have $X^T X$ and $X^T Y$ , we can solve for the vector $\beta$ by computing:

\beta = (X^T X)^{-1} X^T Y

For example, for a simple linear regression, this will give the estimates of the intercept ( $\beta_0$ ) and the slope ( $\beta_1$ ).

4. Making Predictions Using the Fitted Model

Once the regression coefficients are estimated, we can use the fitted model to make predictions for new data points. The predicted values of $Y$ (denoted $\hat{Y}$ ) are computed as:

\hat{Y} = X \hat{\beta}

Where:

$X$ is the new matrix of independent variable values (which may include both old and new data points),
$\hat{\beta}$ is the vector of estimated regression coefficients.

For simple linear regression, this equation gives the predicted values of the dependent variable based on the values of the independent variable.

5. Properties and Advantages of Matrix Formulation

a. Efficient Computation

Using matrices for solving linear regression is computationally efficient, especially with a large number of predictors. The matrix-based normal equation allows for direct computation of the coefficients without needing to iterate through a large number of individual calculations.

b. Generalization to Multiple Regression

The matrix form easily extends to multiple regression, where there are more than one independent variable. The matrix formulation provides a compact way to handle multiple predictors and their relationships with the dependent variable.

c. Clear Representation of Multivariate Regression

When you have more than one independent variable (predictor), the matrix formulation provides a clear and scalable way to represent the system, avoiding the complexity of manually solving for each coefficient.

6. Assumptions and Considerations

The matrix-based method assumes that:

The model is linear, meaning the relationship between the independent and dependent variables can be described by a straight line.
The errors (residuals) have constant variance (homoscedasticity).
The residuals are independent of each other.
The number of observations $n$ is greater than the number of predictors $p$ , allowing for the computation of $(X^T X)^{-1}$ .

If any of these assumptions are violated, the estimates of $\beta$ may be biased or inefficient, and additional diagnostic checks may be needed.

Summary

The linear regression model can be efficiently solved using matrix algebra. The goal is to find the vector $\beta$ of regression coefficients that minimize the sum of squared residuals. The normal equation:

\beta = (X^T X)^{-1} X^T Y

provides a closed-form solution for the regression coefficients. This matrix approach is computationally efficient, especially for multiple regression with many predictors, and allows for easy prediction of the dependent variable for new values of the independent variables.

Previous topic 34

Multiple Linear Regression and Certain Nonlinear Regression Models

Next topic 36

Properties of the Least Squares Estimators

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

MS-251›Linear Regression Model Using Matrices

Probability and StatisticsTopic 35 of 36

Linear Regression Model Using Matrices

11 minread

1,937words

Intermediatelevel

Linear Regression Model Using Matrices

Let’s break down how linear regression can be represented and solved using matrices.

1. The Linear Regression Model

In multiple linear regression, the model is defined as:

Y = X \beta + \epsilon

Where:

$Y$ is the $n \times 1$ vector of observed values (dependent variable),
$X$ $X$ is the $n \times p$ $n \times p$ matrix of independent variables (with $n$ $n$ data points and $p$ $p$ predictors),
- The matrix $X$ typically includes a column of ones for the intercept term.
$\beta$ is the $p \times 1$ vector of regression coefficients (including the intercept and slopes),
$\epsilon$ is the $n \times 1$ vector of residuals (errors), representing the difference between the observed and predicted values.

For simple linear regression, the model is expressed as:

Y = X \beta + \epsilon

Where:

$Y = \begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{bmatrix}$ is a column vector of the dependent variable values.
$X = \begin{bmatrix} 1 & X_1 \\ 1 & X_2 \\ \vdots & 1 & X_n \end{bmatrix}$ is a matrix where the first column is a column of ones (to account for the intercept) and the second column contains the values of the independent variable $X$ .
$\beta = \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix}$ is the vector of parameters (intercept $\beta_0$ and slope $\beta_1$ ).

2. The Ordinary Least Squares (OLS) Estimation

In ordinary least squares (OLS) regression, the objective is to find the vector $\beta$ that minimizes the sum of squared residuals:

S(\beta) = \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 = (Y - X \beta)^T (Y - X \beta)

Where:

$Y$ is the vector of observed values,
$\hat{Y} = X \beta$ is the predicted values,
$(Y - X \beta)$ is the residual vector.

To minimize the sum of squared residuals, we take the derivative of $S(\beta)$ with respect to $\beta$ and set it equal to zero:

\frac{\partial}{\partial \beta} \left( (Y - X \beta)^T (Y - X \beta) \right) = 0

Expanding this expression:

\frac{\partial}{\partial \beta} \left( Y^T Y - Y^T X \beta - \beta^T X^T Y + \beta^T X^T X \beta \right) = 0

This simplifies to:

-2 X^T Y + 2 X^T X \beta = 0

Solving for $\beta$ , we get the normal equation:

\beta = (X^T X)^{-1} X^T Y

Where:

$X^T$ is the transpose of the matrix $X$ ,
$X^T X$ is the $p \times p$ matrix of the Gram matrix (a measure of the correlation between the predictors),
$X^T Y$ is a $p \times 1$ vector of cross-products of the independent variables with the dependent variable,
$(X^T X)^{-1}$ is the inverse of the matrix $X^T X$ .

The resulting vector $\hat{\beta} = \begin{bmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{bmatrix}$ contains the estimates of the regression coefficients, including the intercept and the slope(s).

3. Step-by-Step Matrix Solution

Step 1: Set up the matrices

For a simple linear regression example with $n = 4$ data points, assume:

Y = \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \\ Y_4 \end{bmatrix}, \quad X = \begin{bmatrix} 1 & X_1 \\ 1 & X_2 \\ 1 & X_3 \\ 1 & X_4 \end{bmatrix}

This matrix $X$ includes a column of ones to account for the intercept term. Suppose the independent variable values $X_1, X_2, X_3, X_4$ are given.

Step 2: Calculate $X^T X$ and $X^T Y$

The next step is to calculate the matrix products $X^T X$ and $X^T Y$ .

$X^T X$ is the $p \times p$ matrix:

X^T X = \begin{bmatrix} n & \sum X_i \\ \sum X_i & \sum X_i^2 \end{bmatrix}

$X^T Y$ is the $p \times 1$ matrix (vector):

X^T Y = \begin{bmatrix} \sum Y_i \\ \sum X_i Y_i \end{bmatrix}

Step 3: Solve for $\beta$

Once we have $X^T X$ and $X^T Y$ , we can solve for the vector $\beta$ by computing:

\beta = (X^T X)^{-1} X^T Y

For example, for a simple linear regression, this will give the estimates of the intercept ( $\beta_0$ ) and the slope ( $\beta_1$ ).

4. Making Predictions Using the Fitted Model

Once the regression coefficients are estimated, we can use the fitted model to make predictions for new data points. The predicted values of $Y$ (denoted $\hat{Y}$ ) are computed as:

\hat{Y} = X \hat{\beta}

Where:

$X$ is the new matrix of independent variable values (which may include both old and new data points),
$\hat{\beta}$ is the vector of estimated regression coefficients.

For simple linear regression, this equation gives the predicted values of the dependent variable based on the values of the independent variable.

5. Properties and Advantages of Matrix Formulation

a. Efficient Computation

b. Generalization to Multiple Regression

c. Clear Representation of Multivariate Regression

6. Assumptions and Considerations

The matrix-based method assumes that:

The model is linear, meaning the relationship between the independent and dependent variables can be described by a straight line.
The errors (residuals) have constant variance (homoscedasticity).
The residuals are independent of each other.
The number of observations $n$ is greater than the number of predictors $p$ , allowing for the computation of $(X^T X)^{-1}$ .

If any of these assumptions are violated, the estimates of $\beta$ may be biased or inefficient, and additional diagnostic checks may be needed.

Summary

\beta = (X^T X)^{-1} X^T Y

Previous topic 34

Multiple Linear Regression and Certain Nonlinear Regression Models

Next topic 36

Properties of the Least Squares Estimators

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

Linear Regression Model Using Matrices

Linear Regression Model Using Matrices

1. The Linear Regression Model

2. The Ordinary Least Squares (OLS) Estimation

3. Step-by-Step Matrix Solution

Step 1: Set up the matrices

Step 2: Calculate XTXX^T XXTX and XTYX^T YXTY

Step 3: Solve for β\betaβ

4. Making Predictions Using the Fitted Model

5. Properties and Advantages of Matrix Formulation

a. Efficient Computation

b. Generalization to Multiple Regression

c. Clear Representation of Multivariate Regression

6. Assumptions and Considerations

Summary

Past Papers

Linear Regression Model Using Matrices

Linear Regression Model Using Matrices

1. The Linear Regression Model

2. The Ordinary Least Squares (OLS) Estimation

3. Step-by-Step Matrix Solution

Step 1: Set up the matrices

Step 2: Calculate XTXX^T XXTX and XTYX^T YXTY

Step 3: Solve for β\betaβ

4. Making Predictions Using the Fitted Model

5. Properties and Advantages of Matrix Formulation

a. Efficient Computation

b. Generalization to Multiple Regression

c. Clear Representation of Multivariate Regression

6. Assumptions and Considerations

Summary

Past Papers

Step 2: Calculate $X^T X$ and $X^T Y$

Step 3: Solve for $\beta$

Step 2: Calculate $X^T X$ and $X^T Y$

Step 3: Solve for $\beta$