MS-251›Properties of the Least Squares Estimators

Probability and StatisticsTopic 36 of 36

Properties of the Least Squares Estimators

9 minread

1,460words

Intermediatelevel

Properties of the Least Squares Estimators (LSE)

In linear regression, the least squares estimators (LSE) are the values of the regression coefficients $\beta_0$ (intercept) and $\beta_1, \beta_2, \dots, \beta_p$ (slopes) that minimize the sum of squared residuals. These estimators have several important statistical properties that make them desirable for estimating the true relationship between the dependent and independent variables in a regression model.

Let’s go over the properties of the least squares estimators:

1. Unbiasedness of the Least Squares Estimators

One of the fundamental properties of the least squares estimators is unbiasedness. A statistical estimator is unbiased if, on average, it correctly estimates the parameter it is intended to estimate.

Property: The least squares estimators $\hat{\beta} = (X^T X)^{-1} X^T Y$ are unbiased, meaning that:

E[\hat{\beta}] = \beta

Where:

$\hat{\beta}$ is the vector of least squares estimators (the estimated regression coefficients),
$\beta$ is the vector of true regression coefficients (the coefficients in the population model),
$E[\hat{\beta}]$ denotes the expected value of the estimator.

This means that, if we were to repeatedly sample and estimate the regression coefficients, the average of these estimates would equal the true values of $\beta$ .

Explanation:

The assumption behind this property is that the error term $\epsilon$ has an expected value of 0 ( $E[\epsilon] = 0$ ) and that the design matrix $X$ is not perfectly collinear (i.e., $X^T X$ is invertible).

2. Best Linear Unbiased Estimator (BLUE)

The Gauss-Markov Theorem states that under certain conditions, the least squares estimators are the best linear unbiased estimators (BLUE). This means that among all possible linear unbiased estimators of $\beta$ , the least squares estimators have the smallest variance.

Property: The least squares estimators are the BLUE estimators, i.e., they have the minimum variance among all linear, unbiased estimators.

Conditions for BLUE:

To ensure that the least squares estimators are BLUE, the following assumptions must hold:

Linearity: The model is linear, i.e., the relationship between $Y$ and $X$ is linear.
Unbiasedness: The errors have zero mean, $E[\epsilon] = 0$ .
Homoscedasticity: The errors have constant variance ( $\text{Var}(\epsilon) = \sigma^2 I$ ).
No Autocorrelation: The errors are not correlated with each other ( $\text{Cov}(\epsilon_i, \epsilon_j) = 0$ for $i \neq j$ ).

Under these conditions, the least squares estimators are the best (i.e., they have the lowest variance) linear unbiased estimators.

3. Consistency of the Least Squares Estimators

Consistency refers to the property that as the sample size $n$ tends to infinity, the estimator converges in probability to the true value of the parameter it estimates.

Property: The least squares estimators are consistent estimators of the true regression coefficients, meaning that:

\hat{\beta} \xrightarrow{P} \beta \quad \text{as} \quad n \to \infty

Where:

$\hat{\beta}$ is the vector of least squares estimators,
$\beta$ is the vector of true regression coefficients,
$\xrightarrow{P}$ denotes convergence in probability.

Explanation:

This means that with an increasing number of observations, the least squares estimators will tend to get closer to the true values of the parameters. This property holds under standard assumptions such as the independence and identical distribution (i.i.d.) of the errors and the presence of the intercept term in the model.

4. Efficiency of the Least Squares Estimators

The efficiency of an estimator refers to how "tight" the estimator’s sampling distribution is. An efficient estimator has the smallest possible variance among all unbiased estimators.

Property: Under the assumptions of the Gauss-Markov theorem, the least squares estimators are the most efficient linear unbiased estimators, meaning they have the minimum variance among all unbiased estimators.

Explanation:

The efficiency of LSEs comes from the fact that they minimize the sum of squared residuals, and under the conditions of homoscedasticity and uncorrelated errors, no other linear unbiased estimator can have a smaller variance.

5. Homoscedasticity of the Least Squares Estimators

Homoscedasticity means that the variance of the residuals (errors) is constant across all levels of the independent variable(s). In the context of least squares estimators, this property relates to how the variance of $\hat{\beta}$ behaves.

Property: The variance of the least squares estimator $\hat{\beta}$ is:

\text{Var}(\hat{\beta}) = \sigma^2 (X^T X)^{-1}

Where:

$\sigma^2$ is the variance of the error term $\epsilon$ ,
$(X^T X)^{-1}$ is the inverse of the matrix $X^T X$ , which depends on the design matrix.

This formula shows that the variance of $\hat{\beta}$ depends on the spread and values of the independent variable(s) (encoded in $X$ ) and the error variance $\sigma^2$ .

Implications:

If the design matrix $X$ has a higher condition number (i.e., if the columns of $X$ are highly correlated), the variance of the estimates will be large, making the estimators less reliable. This phenomenon is known as multicollinearity.

6. Normality of the Least Squares Estimators (Asymptotically)

For the least squares estimators to be normally distributed, certain conditions must be met, such as the error terms following a normal distribution. However, even if the errors are not normally distributed, the central limit theorem (CLT) ensures that for large sample sizes, the distribution of the least squares estimators will approach normality.

Property: The least squares estimators $\hat{\beta}$ are asymptotically normal under the assumption that the errors are normally distributed. For large samples, the estimators tend to follow a normal distribution:

\hat{\beta} \sim N(\beta, \text{Var}(\hat{\beta}))

Explanation:

This property is important because it allows for inference, such as hypothesis testing and confidence intervals, using standard statistical methods (like $t$ -tests and $F$ -tests) even if the sample size is large.

7. Independence of the Least Squares Estimators

The least squares estimators $\hat{\beta}$ are independent when the errors in the model are uncorrelated (i.e., no autocorrelation). In such cases, each component of the vector $\hat{\beta}$ will be independent of the others.

Summary of Properties of Least Squares Estimators

Unbiasedness: The least squares estimators are unbiased, meaning $E[\hat{\beta}] = \beta$ .
Best Linear Unbiased Estimator (BLUE): Under the Gauss-Markov assumptions (linearity, zero mean errors, homoscedasticity, no autocorrelation), LSEs are the best linear unbiased estimators.
Consistency: As the sample size increases, the least squares estimators converge to the true values of the regression coefficients.
Efficiency: The least squares estimators have the minimum variance among all unbiased estimators when the Gauss-Markov assumptions hold.
Homoscedasticity: The variance of the LSE depends on the error variance and the design matrix.
Normality (Asymptotically): The LSEs are approximately normally distributed for large sample sizes, even if the errors are not normally distributed.
Independence: The LSEs are independent if the error terms are uncorrelated.

These properties make the least squares method a robust and reliable method for estimating regression coefficients in linear regression models.

Previous topic 35

Linear Regression Model Using Matrices

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

MS-251›Properties of the Least Squares Estimators

Probability and StatisticsTopic 36 of 36

Properties of the Least Squares Estimators

9 minread

1,460words

Intermediatelevel

Properties of the Least Squares Estimators (LSE)

Let’s go over the properties of the least squares estimators:

1. Unbiasedness of the Least Squares Estimators

Property: The least squares estimators $\hat{\beta} = (X^T X)^{-1} X^T Y$ are unbiased, meaning that:

E[\hat{\beta}] = \beta

Where:

$\hat{\beta}$ is the vector of least squares estimators (the estimated regression coefficients),
$\beta$ is the vector of true regression coefficients (the coefficients in the population model),
$E[\hat{\beta}]$ denotes the expected value of the estimator.

This means that, if we were to repeatedly sample and estimate the regression coefficients, the average of these estimates would equal the true values of $\beta$ .

Explanation:

The assumption behind this property is that the error term $\epsilon$ has an expected value of 0 ( $E[\epsilon] = 0$ ) and that the design matrix $X$ is not perfectly collinear (i.e., $X^T X$ is invertible).

2. Best Linear Unbiased Estimator (BLUE)

Property: The least squares estimators are the BLUE estimators, i.e., they have the minimum variance among all linear, unbiased estimators.

Conditions for BLUE:

To ensure that the least squares estimators are BLUE, the following assumptions must hold:

Linearity: The model is linear, i.e., the relationship between $Y$ and $X$ is linear.
Unbiasedness: The errors have zero mean, $E[\epsilon] = 0$ .
Homoscedasticity: The errors have constant variance ( $\text{Var}(\epsilon) = \sigma^2 I$ ).
No Autocorrelation: The errors are not correlated with each other ( $\text{Cov}(\epsilon_i, \epsilon_j) = 0$ for $i \neq j$ ).

Under these conditions, the least squares estimators are the best (i.e., they have the lowest variance) linear unbiased estimators.

3. Consistency of the Least Squares Estimators

Consistency refers to the property that as the sample size $n$ tends to infinity, the estimator converges in probability to the true value of the parameter it estimates.

Property: The least squares estimators are consistent estimators of the true regression coefficients, meaning that:

\hat{\beta} \xrightarrow{P} \beta \quad \text{as} \quad n \to \infty

Where:

$\hat{\beta}$ is the vector of least squares estimators,
$\beta$ is the vector of true regression coefficients,
$\xrightarrow{P}$ denotes convergence in probability.

Explanation:

This means that with an increasing number of observations, the least squares estimators will tend to get closer to the true values of the parameters. This property holds under standard assumptions such as the independence and identical distribution (i.i.d.) of the errors and the presence of the intercept term in the model.

4. Efficiency of the Least Squares Estimators

The efficiency of an estimator refers to how "tight" the estimator’s sampling distribution is. An efficient estimator has the smallest possible variance among all unbiased estimators.

Property: Under the assumptions of the Gauss-Markov theorem, the least squares estimators are the most efficient linear unbiased estimators, meaning they have the minimum variance among all unbiased estimators.

Explanation:

The efficiency of LSEs comes from the fact that they minimize the sum of squared residuals, and under the conditions of homoscedasticity and uncorrelated errors, no other linear unbiased estimator can have a smaller variance.

5. Homoscedasticity of the Least Squares Estimators

Property: The variance of the least squares estimator $\hat{\beta}$ is:

\text{Var}(\hat{\beta}) = \sigma^2 (X^T X)^{-1}

Where:

$\sigma^2$ is the variance of the error term $\epsilon$ ,
$(X^T X)^{-1}$ is the inverse of the matrix $X^T X$ , which depends on the design matrix.

This formula shows that the variance of $\hat{\beta}$ depends on the spread and values of the independent variable(s) (encoded in $X$ ) and the error variance $\sigma^2$ .

Implications:

If the design matrix $X$ has a higher condition number (i.e., if the columns of $X$ are highly correlated), the variance of the estimates will be large, making the estimators less reliable. This phenomenon is known as multicollinearity.

6. Normality of the Least Squares Estimators (Asymptotically)

Property: The least squares estimators $\hat{\beta}$ are asymptotically normal under the assumption that the errors are normally distributed. For large samples, the estimators tend to follow a normal distribution:

\hat{\beta} \sim N(\beta, \text{Var}(\hat{\beta}))

Explanation:

This property is important because it allows for inference, such as hypothesis testing and confidence intervals, using standard statistical methods (like $t$ -tests and $F$ -tests) even if the sample size is large.

7. Independence of the Least Squares Estimators

Summary of Properties of Least Squares Estimators

Unbiasedness: The least squares estimators are unbiased, meaning $E[\hat{\beta}] = \beta$ .
Best Linear Unbiased Estimator (BLUE): Under the Gauss-Markov assumptions (linearity, zero mean errors, homoscedasticity, no autocorrelation), LSEs are the best linear unbiased estimators.
Consistency: As the sample size increases, the least squares estimators converge to the true values of the regression coefficients.
Efficiency: The least squares estimators have the minimum variance among all unbiased estimators when the Gauss-Markov assumptions hold.
Homoscedasticity: The variance of the LSE depends on the error variance and the design matrix.
Normality (Asymptotically): The LSEs are approximately normally distributed for large sample sizes, even if the errors are not normally distributed.
Independence: The LSEs are independent if the error terms are uncorrelated.

These properties make the least squares method a robust and reliable method for estimating regression coefficients in linear regression models.

Previous topic 35

Linear Regression Model Using Matrices

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.