ScholarQuill logoScholarQuillUniversity Notes
  • Notes
  • Past Papers
  • Blogs
  • Todo
Login
ScholarQuill logoScholarQuillUniversity Notes
Login
NotesPast PapersBlogsTodo
More
SubjectsDiscussionCGPA CalculatorGPA CalculatorStudent PortalCourse Outline
About
About usPrivacy PolicyReportContact
Notes
Past Papers
Blogs
Todo
Analytics
    Current Subject
    🧩
    Probability and Statistics
    MS-251
    Progress0 / 36 topics
    Topics
    1. Introduction: Statistics and Data Analysis2. Statistical Inference3. Samples, Populations, and the Role of Probability4. Sampling Procedures5. Discrete and Continuous Data6. Statistical Modeling7. Types of Statistical Studies8. Probability: Sample Space, Events, Counting Sample Points9. Probability of an Event10. Additive Rules11. Conditional Probability12. Independence and the Product Rule13. Bayes’ Rule14. Random Variables and Probability Distributions15. Mathematical Expectation: Mean of a Random Variable16. Variance and Covariance of Random Variables17. Means and Variances of Linear Combinations of Random Variables18. Chebyshev’s Theorem19. Discrete Probability Distributions20. Continuous Probability Distributions21. Fundamental Sampling Distributions22. Sampling Distributions and Data Descriptions23. Random Sampling24. Sampling Distributions25. Sampling Distribution of Means and the Central Limit Theorem26. Sampling Distribution of S227. t-Distribution28. F-Quantile and Probability Plots29. Single Sample & One- and Two-Sample Estimation Problems30. Single Sample & One- and Two-Sample Tests of Hypotheses31. The Use of P-Values for Decision Making in Testing Hypotheses32. Regression: Linear Regression and Correlation33. Least Squares and the Fitted Model34. Multiple Linear Regression and Certain Nonlinear Regression Models35. Linear Regression Model Using Matrices36. Properties of the Least Squares Estimators
    MS-251›Properties of the Least Squares Estimators
    Probability and StatisticsTopic 36 of 36

    Properties of the Least Squares Estimators

    9 minread
    1,460words
    Intermediatelevel

    Properties of the Least Squares Estimators (LSE)

    In linear regression, the least squares estimators (LSE) are the values of the regression coefficients β0\beta_0β0​ (intercept) and β1,β2,…,βp\beta_1, \beta_2, \dots, \beta_pβ1​,β2​,…,βp​ (slopes) that minimize the sum of squared residuals. These estimators have several important statistical properties that make them desirable for estimating the true relationship between the dependent and independent variables in a regression model.

    Let’s go over the properties of the least squares estimators:


    1. Unbiasedness of the Least Squares Estimators

    One of the fundamental properties of the least squares estimators is unbiasedness. A statistical estimator is unbiased if, on average, it correctly estimates the parameter it is intended to estimate.

    • Property: The least squares estimators β^=(XTX)−1XTY\hat{\beta} = (X^T X)^{-1} X^T Yβ^​=(XTX)−1XTY are unbiased, meaning that:
    E[β^]=βE[\hat{\beta}] = \betaE[β^​]=β

    Where:

    • β^\hat{\beta}β^​ is the vector of least squares estimators (the estimated regression coefficients),
    • β\betaβ is the vector of true regression coefficients (the coefficients in the population model),
    • E[β^]E[\hat{\beta}]E[β^​] denotes the expected value of the estimator.

    This means that, if we were to repeatedly sample and estimate the regression coefficients, the average of these estimates would equal the true values of β\betaβ.

    Explanation:

    • The assumption behind this property is that the error term ϵ\epsilonϵ has an expected value of 0 (E[ϵ]=0E[\epsilon] = 0E[ϵ]=0) and that the design matrix XXX is not perfectly collinear (i.e., XTXX^T XXTX is invertible).

    2. Best Linear Unbiased Estimator (BLUE)

    The Gauss-Markov Theorem states that under certain conditions, the least squares estimators are the best linear unbiased estimators (BLUE). This means that among all possible linear unbiased estimators of β\betaβ, the least squares estimators have the smallest variance.

    • Property: The least squares estimators are the BLUE estimators, i.e., they have the minimum variance among all linear, unbiased estimators.

    Conditions for BLUE:

    To ensure that the least squares estimators are BLUE, the following assumptions must hold:

    1. Linearity: The model is linear, i.e., the relationship between YYY and XXX is linear.
    2. Unbiasedness: The errors have zero mean, E[ϵ]=0E[\epsilon] = 0E[ϵ]=0.
    3. Homoscedasticity: The errors have constant variance (Var(ϵ)=σ2I\text{Var}(\epsilon) = \sigma^2 IVar(ϵ)=σ2I).
    4. No Autocorrelation: The errors are not correlated with each other (Cov(ϵi,ϵj)=0\text{Cov}(\epsilon_i, \epsilon_j) = 0Cov(ϵi​,ϵj​)=0 for i≠ji \neq ji=j).

    Under these conditions, the least squares estimators are the best (i.e., they have the lowest variance) linear unbiased estimators.

    3. Consistency of the Least Squares Estimators

    Consistency refers to the property that as the sample size nnn tends to infinity, the estimator converges in probability to the true value of the parameter it estimates.

    • Property: The least squares estimators are consistent estimators of the true regression coefficients, meaning that:
    β^→Pβasn→∞\hat{\beta} \xrightarrow{P} \beta \quad \text{as} \quad n \to \inftyβ^​P​βasn→∞

    Where:

    • β^\hat{\beta}β^​ is the vector of least squares estimators,
    • β\betaβ is the vector of true regression coefficients,
    • →P\xrightarrow{P}P​ denotes convergence in probability.

    Explanation:

    • This means that with an increasing number of observations, the least squares estimators will tend to get closer to the true values of the parameters. This property holds under standard assumptions such as the independence and identical distribution (i.i.d.) of the errors and the presence of the intercept term in the model.

    4. Efficiency of the Least Squares Estimators

    The efficiency of an estimator refers to how "tight" the estimator’s sampling distribution is. An efficient estimator has the smallest possible variance among all unbiased estimators.

    • Property: Under the assumptions of the Gauss-Markov theorem, the least squares estimators are the most efficient linear unbiased estimators, meaning they have the minimum variance among all unbiased estimators.

    Explanation:

    • The efficiency of LSEs comes from the fact that they minimize the sum of squared residuals, and under the conditions of homoscedasticity and uncorrelated errors, no other linear unbiased estimator can have a smaller variance.

    5. Homoscedasticity of the Least Squares Estimators

    Homoscedasticity means that the variance of the residuals (errors) is constant across all levels of the independent variable(s). In the context of least squares estimators, this property relates to how the variance of β^\hat{\beta}β^​ behaves.

    • Property: The variance of the least squares estimator β^\hat{\beta}β^​ is:
    Var(β^)=σ2(XTX)−1\text{Var}(\hat{\beta}) = \sigma^2 (X^T X)^{-1}Var(β^​)=σ2(XTX)−1

    Where:

    • σ2\sigma^2σ2 is the variance of the error term ϵ\epsilonϵ,
    • (XTX)−1(X^T X)^{-1}(XTX)−1 is the inverse of the matrix XTXX^T XXTX, which depends on the design matrix.

    This formula shows that the variance of β^\hat{\beta}β^​ depends on the spread and values of the independent variable(s) (encoded in XXX) and the error variance σ2\sigma^2σ2.

    Implications:

    • If the design matrix XXX has a higher condition number (i.e., if the columns of XXX are highly correlated), the variance of the estimates will be large, making the estimators less reliable. This phenomenon is known as multicollinearity.

    6. Normality of the Least Squares Estimators (Asymptotically)

    For the least squares estimators to be normally distributed, certain conditions must be met, such as the error terms following a normal distribution. However, even if the errors are not normally distributed, the central limit theorem (CLT) ensures that for large sample sizes, the distribution of the least squares estimators will approach normality.

    • Property: The least squares estimators β^\hat{\beta}β^​ are asymptotically normal under the assumption that the errors are normally distributed. For large samples, the estimators tend to follow a normal distribution:
    β^∼N(β,Var(β^))\hat{\beta} \sim N(\beta, \text{Var}(\hat{\beta}))β^​∼N(β,Var(β^​))

    Explanation:

    • This property is important because it allows for inference, such as hypothesis testing and confidence intervals, using standard statistical methods (like ttt-tests and FFF-tests) even if the sample size is large.

    7. Independence of the Least Squares Estimators

    The least squares estimators β^\hat{\beta}β^​ are independent when the errors in the model are uncorrelated (i.e., no autocorrelation). In such cases, each component of the vector β^\hat{\beta}β^​ will be independent of the others.


    Summary of Properties of Least Squares Estimators

    1. Unbiasedness: The least squares estimators are unbiased, meaning E[β^]=βE[\hat{\beta}] = \betaE[β^​]=β.
    2. Best Linear Unbiased Estimator (BLUE): Under the Gauss-Markov assumptions (linearity, zero mean errors, homoscedasticity, no autocorrelation), LSEs are the best linear unbiased estimators.
    3. Consistency: As the sample size increases, the least squares estimators converge to the true values of the regression coefficients.
    4. Efficiency: The least squares estimators have the minimum variance among all unbiased estimators when the Gauss-Markov assumptions hold.
    5. Homoscedasticity: The variance of the LSE depends on the error variance and the design matrix.
    6. Normality (Asymptotically): The LSEs are approximately normally distributed for large sample sizes, even if the errors are not normally distributed.
    7. Independence: The LSEs are independent if the error terms are uncorrelated.

    These properties make the least squares method a robust and reliable method for estimating regression coefficients in linear regression models.

    Previous topic 35
    Linear Regression Model Using Matrices

    Past Papers

    Open this section to load past papers

    Click on Show Past Papers to see past papers.
    On This Page
      Reading Stats
      Est. reading time9 min
      Word count1,460
      Code examples0
      DifficultyIntermediate