In linear regression, the relationship between the dependent variable (Y) and one or more independent variables (X) can be expressed in a matrix form for more efficient computation, especially when there are multiple predictors. This matrix representation is particularly useful when working with multiple regression (more than one independent variable) and provides a more general and compact way to solve for the regression coefficients.
Let’s break down how linear regression can be represented and solved using matrices.
1. The Linear Regression Model
In multiple linear regression, the model is defined as:
Y=Xβ+ϵ
Where:
Y is the n×1 vector of observed values (dependent variable),
X is the n×p matrix of independent variables (with n data points and p predictors),
The matrix X typically includes a column of ones for the intercept term.
β is the p×1 vector of regression coefficients (including the intercept and slopes),
ϵ is the n×1 vector of residuals (errors), representing the difference between the observed and predicted values.
For simple linear regression, the model is expressed as:
Y=Xβ+ϵ
Where:
Y=Y1Y2⋮Yn is a column vector of the dependent variable values.
X=11⋮X1X21Xn is a matrix where the first column is a column of ones (to account for the intercept) and the second column contains the values of the independent variable X.
β=[β0β1] is the vector of parameters (intercept β0 and slope β1).
2. The Ordinary Least Squares (OLS) Estimation
In ordinary least squares (OLS) regression, the objective is to find the vector β that minimizes the sum of squared residuals:
S(β)=i=1∑n(Yi−Y^i)2=(Y−Xβ)T(Y−Xβ)
Where:
Y is the vector of observed values,
Y^=Xβ is the predicted values,
(Y−Xβ) is the residual vector.
To minimize the sum of squared residuals, we take the derivative of S(β) with respect to β and set it equal to zero:
∂β∂((Y−Xβ)T(Y−Xβ))=0
Expanding this expression:
∂β∂(YTY−YTXβ−βTXTY+βTXTXβ)=0
This simplifies to:
−2XTY+2XTXβ=0
Solving for β, we get the normal equation:
β=(XTX)−1XTY
Where:
XT is the transpose of the matrix X,
XTX is the p×p matrix of the Gram matrix (a measure of the correlation between the predictors),
XTY is a p×1 vector of cross-products of the independent variables with the dependent variable,
(XTX)−1 is the inverse of the matrix XTX.
The resulting vector β^=[β^0β^1] contains the estimates of the regression coefficients, including the intercept and the slope(s).
3. Step-by-Step Matrix Solution
Step 1: Set up the matrices
For a simple linear regression example with n=4 data points, assume:
Y=Y1Y2Y3Y4,X=1111X1X2X3X4
This matrix X includes a column of ones to account for the intercept term. Suppose the independent variable values X1,X2,X3,X4 are given.
Step 2: Calculate XTX and XTY
The next step is to calculate the matrix products XTX and XTY.
XTX is the p×p matrix:
XTX=[n∑Xi∑Xi∑Xi2]
XTY is the p×1 matrix (vector):
XTY=[∑Yi∑XiYi]
Step 3: Solve for β
Once we have XTX and XTY, we can solve for the vector β by computing:
β=(XTX)−1XTY
For example, for a simple linear regression, this will give the estimates of the intercept (β0) and the slope (β1).
4. Making Predictions Using the Fitted Model
Once the regression coefficients are estimated, we can use the fitted model to make predictions for new data points. The predicted values of Y (denoted Y^) are computed as:
Y^=Xβ^
Where:
X is the new matrix of independent variable values (which may include both old and new data points),
β^ is the vector of estimated regression coefficients.
For simple linear regression, this equation gives the predicted values of the dependent variable based on the values of the independent variable.
5. Properties and Advantages of Matrix Formulation
a. Efficient Computation
Using matrices for solving linear regression is computationally efficient, especially with a large number of predictors. The matrix-based normal equation allows for direct computation of the coefficients without needing to iterate through a large number of individual calculations.
b. Generalization to Multiple Regression
The matrix form easily extends to multiple regression, where there are more than one independent variable. The matrix formulation provides a compact way to handle multiple predictors and their relationships with the dependent variable.
c. Clear Representation of Multivariate Regression
When you have more than one independent variable (predictor), the matrix formulation provides a clear and scalable way to represent the system, avoiding the complexity of manually solving for each coefficient.
6. Assumptions and Considerations
The matrix-based method assumes that:
The model is linear, meaning the relationship between the independent and dependent variables can be described by a straight line.
The errors (residuals) have constant variance (homoscedasticity).
The residuals are independent of each other.
The number of observations n is greater than the number of predictors p, allowing for the computation of (XTX)−1.
If any of these assumptions are violated, the estimates of β may be biased or inefficient, and additional diagnostic checks may be needed.
Summary
The linear regression model can be efficiently solved using matrix algebra. The goal is to find the vector β of regression coefficients that minimize the sum of squared residuals. The normal equation:
β=(XTX)−1XTY
provides a closed-form solution for the regression coefficients. This matrix approach is computationally efficient, especially for multiple regression with many predictors, and allows for easy prediction of the dependent variable for new values of the independent variables.