ScholarQuill logoScholarQuillUniversity Notes
  • Notes
  • Past Papers
  • Blogs
  • Todo
Login
ScholarQuill logoScholarQuillUniversity Notes
Login
NotesPast PapersBlogsTodo
More
SubjectsDiscussionCGPA CalculatorGPA CalculatorStudent PortalCourse Outline
About
About usPrivacy PolicyReportContact
Notes
Past Papers
Blogs
Todo
Analytics
    Current Subject
    🧩
    Probability and Statistics
    MS-251
    Progress0 / 36 topics
    Topics
    1. Introduction: Statistics and Data Analysis2. Statistical Inference3. Samples, Populations, and the Role of Probability4. Sampling Procedures5. Discrete and Continuous Data6. Statistical Modeling7. Types of Statistical Studies8. Probability: Sample Space, Events, Counting Sample Points9. Probability of an Event10. Additive Rules11. Conditional Probability12. Independence and the Product Rule13. Bayes’ Rule14. Random Variables and Probability Distributions15. Mathematical Expectation: Mean of a Random Variable16. Variance and Covariance of Random Variables17. Means and Variances of Linear Combinations of Random Variables18. Chebyshev’s Theorem19. Discrete Probability Distributions20. Continuous Probability Distributions21. Fundamental Sampling Distributions22. Sampling Distributions and Data Descriptions23. Random Sampling24. Sampling Distributions25. Sampling Distribution of Means and the Central Limit Theorem26. Sampling Distribution of S227. t-Distribution28. F-Quantile and Probability Plots29. Single Sample & One- and Two-Sample Estimation Problems30. Single Sample & One- and Two-Sample Tests of Hypotheses31. The Use of P-Values for Decision Making in Testing Hypotheses32. Regression: Linear Regression and Correlation33. Least Squares and the Fitted Model34. Multiple Linear Regression and Certain Nonlinear Regression Models35. Linear Regression Model Using Matrices36. Properties of the Least Squares Estimators
    MS-251›Linear Regression Model Using Matrices
    Probability and StatisticsTopic 35 of 36

    Linear Regression Model Using Matrices

    11 minread
    1,937words
    Intermediatelevel

    Linear Regression Model Using Matrices

    In linear regression, the relationship between the dependent variable (YYY) and one or more independent variables (XXX) can be expressed in a matrix form for more efficient computation, especially when there are multiple predictors. This matrix representation is particularly useful when working with multiple regression (more than one independent variable) and provides a more general and compact way to solve for the regression coefficients.

    Let’s break down how linear regression can be represented and solved using matrices.


    1. The Linear Regression Model

    In multiple linear regression, the model is defined as:

    Y=Xβ+ϵY = X \beta + \epsilonY=Xβ+ϵ

    Where:

    • YYY is the n×1n \times 1n×1 vector of observed values (dependent variable),
    • XXX is the n×pn \times pn×p matrix of independent variables (with nnn data points and ppp predictors),
      • The matrix XXX typically includes a column of ones for the intercept term.
    • β\betaβ is the p×1p \times 1p×1 vector of regression coefficients (including the intercept and slopes),
    • ϵ\epsilonϵ is the n×1n \times 1n×1 vector of residuals (errors), representing the difference between the observed and predicted values.

    For simple linear regression, the model is expressed as:

    Y=Xβ+ϵY = X \beta + \epsilonY=Xβ+ϵ

    Where:

    • Y=[Y1Y2⋮Yn]Y = \begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{bmatrix}Y=​Y1​Y2​⋮Yn​​​ is a column vector of the dependent variable values.
    • X=[1X11X2⋮1Xn]X = \begin{bmatrix} 1 & X_1 \\ 1 & X_2 \\ \vdots & 1 & X_n \end{bmatrix}X=​11⋮​X1​X2​1​Xn​​​ is a matrix where the first column is a column of ones (to account for the intercept) and the second column contains the values of the independent variable XXX.
    • β=[β0β1]\beta = \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix}β=[β0​β1​​] is the vector of parameters (intercept β0\beta_0β0​ and slope β1\beta_1β1​).

    2. The Ordinary Least Squares (OLS) Estimation

    In ordinary least squares (OLS) regression, the objective is to find the vector β\betaβ that minimizes the sum of squared residuals:

    S(β)=∑i=1n(Yi−Y^i)2=(Y−Xβ)T(Y−Xβ)S(\beta) = \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 = (Y - X \beta)^T (Y - X \beta)S(β)=i=1∑n​(Yi​−Y^i​)2=(Y−Xβ)T(Y−Xβ)

    Where:

    • YYY is the vector of observed values,
    • Y^=Xβ\hat{Y} = X \betaY^=Xβ is the predicted values,
    • (Y−Xβ)(Y - X \beta)(Y−Xβ) is the residual vector.

    To minimize the sum of squared residuals, we take the derivative of S(β)S(\beta)S(β) with respect to β\betaβ and set it equal to zero:

    ∂∂β((Y−Xβ)T(Y−Xβ))=0\frac{\partial}{\partial \beta} \left( (Y - X \beta)^T (Y - X \beta) \right) = 0∂β∂​((Y−Xβ)T(Y−Xβ))=0

    Expanding this expression:

    ∂∂β(YTY−YTXβ−βTXTY+βTXTXβ)=0\frac{\partial}{\partial \beta} \left( Y^T Y - Y^T X \beta - \beta^T X^T Y + \beta^T X^T X \beta \right) = 0∂β∂​(YTY−YTXβ−βTXTY+βTXTXβ)=0

    This simplifies to:

    −2XTY+2XTXβ=0-2 X^T Y + 2 X^T X \beta = 0−2XTY+2XTXβ=0

    Solving for β\betaβ, we get the normal equation:

    β=(XTX)−1XTY\beta = (X^T X)^{-1} X^T Yβ=(XTX)−1XTY

    Where:

    • XTX^TXT is the transpose of the matrix XXX,
    • XTXX^T XXTX is the p×pp \times pp×p matrix of the Gram matrix (a measure of the correlation between the predictors),
    • XTYX^T YXTY is a p×1p \times 1p×1 vector of cross-products of the independent variables with the dependent variable,
    • (XTX)−1(X^T X)^{-1}(XTX)−1 is the inverse of the matrix XTXX^T XXTX.

    The resulting vector β^=[β^0β^1]\hat{\beta} = \begin{bmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{bmatrix}β^​=[β^​0​β^​1​​] contains the estimates of the regression coefficients, including the intercept and the slope(s).

    3. Step-by-Step Matrix Solution

    Step 1: Set up the matrices

    For a simple linear regression example with n=4n = 4n=4 data points, assume:

    Y=[Y1Y2Y3Y4],X=[1X11X21X31X4]Y = \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \\ Y_4 \end{bmatrix}, \quad X = \begin{bmatrix} 1 & X_1 \\ 1 & X_2 \\ 1 & X_3 \\ 1 & X_4 \end{bmatrix}Y=​Y1​Y2​Y3​Y4​​​,X=​1111​X1​X2​X3​X4​​​

    This matrix XXX includes a column of ones to account for the intercept term. Suppose the independent variable values X1,X2,X3,X4X_1, X_2, X_3, X_4X1​,X2​,X3​,X4​ are given.

    Step 2: Calculate XTXX^T XXTX and XTYX^T YXTY

    The next step is to calculate the matrix products XTXX^T XXTX and XTYX^T YXTY.

    • XTXX^T XXTX is the p×pp \times pp×p matrix:
    XTX=[n∑Xi∑Xi∑Xi2]X^T X = \begin{bmatrix} n & \sum X_i \\ \sum X_i & \sum X_i^2 \end{bmatrix}XTX=[n∑Xi​​∑Xi​∑Xi2​​]
    • XTYX^T YXTY is the p×1p \times 1p×1 matrix (vector):
    XTY=[∑Yi∑XiYi]X^T Y = \begin{bmatrix} \sum Y_i \\ \sum X_i Y_i \end{bmatrix}XTY=[∑Yi​∑Xi​Yi​​]

    Step 3: Solve for β\betaβ

    Once we have XTXX^T XXTX and XTYX^T YXTY, we can solve for the vector β\betaβ by computing:

    β=(XTX)−1XTY\beta = (X^T X)^{-1} X^T Yβ=(XTX)−1XTY

    For example, for a simple linear regression, this will give the estimates of the intercept (β0\beta_0β0​) and the slope (β1\beta_1β1​).


    4. Making Predictions Using the Fitted Model

    Once the regression coefficients are estimated, we can use the fitted model to make predictions for new data points. The predicted values of YYY (denoted Y^\hat{Y}Y^) are computed as:

    Y^=Xβ^\hat{Y} = X \hat{\beta}Y^=Xβ^​

    Where:

    • XXX is the new matrix of independent variable values (which may include both old and new data points),
    • β^\hat{\beta}β^​ is the vector of estimated regression coefficients.

    For simple linear regression, this equation gives the predicted values of the dependent variable based on the values of the independent variable.


    5. Properties and Advantages of Matrix Formulation

    a. Efficient Computation

    Using matrices for solving linear regression is computationally efficient, especially with a large number of predictors. The matrix-based normal equation allows for direct computation of the coefficients without needing to iterate through a large number of individual calculations.

    b. Generalization to Multiple Regression

    The matrix form easily extends to multiple regression, where there are more than one independent variable. The matrix formulation provides a compact way to handle multiple predictors and their relationships with the dependent variable.

    c. Clear Representation of Multivariate Regression

    When you have more than one independent variable (predictor), the matrix formulation provides a clear and scalable way to represent the system, avoiding the complexity of manually solving for each coefficient.


    6. Assumptions and Considerations

    The matrix-based method assumes that:

    • The model is linear, meaning the relationship between the independent and dependent variables can be described by a straight line.
    • The errors (residuals) have constant variance (homoscedasticity).
    • The residuals are independent of each other.
    • The number of observations nnn is greater than the number of predictors ppp, allowing for the computation of (XTX)−1(X^T X)^{-1}(XTX)−1.

    If any of these assumptions are violated, the estimates of β\betaβ may be biased or inefficient, and additional diagnostic checks may be needed.


    Summary

    The linear regression model can be efficiently solved using matrix algebra. The goal is to find the vector β\betaβ of regression coefficients that minimize the sum of squared residuals. The normal equation:

    β=(XTX)−1XTY\beta = (X^T X)^{-1} X^T Yβ=(XTX)−1XTY

    provides a closed-form solution for the regression coefficients. This matrix approach is computationally efficient, especially for multiple regression with many predictors, and allows for easy prediction of the dependent variable for new values of the independent variables.

    Previous topic 34
    Multiple Linear Regression and Certain Nonlinear Regression Models
    Next topic 36
    Properties of the Least Squares Estimators

    Past Papers

    Open this section to load past papers

    Click on Show Past Papers to see past papers.
    On This Page
      Reading Stats
      Est. reading time11 min
      Word count1,937
      Code examples0
      DifficultyIntermediate