MS-251›Statistical Inference

Probability and StatisticsTopic 2 of 36

Statistical Inference

6 minread

1,093words

Intermediatelevel

Statistical Inference

Statistical inference is the process of drawing conclusions or making decisions about a population based on sample data. Since it is often impractical or impossible to gather data from an entire population, statistical inference helps us make predictions, estimates, and test hypotheses about the population based on a sample.

The two main aspects of statistical inference are:

Estimation: Using sample data to estimate population parameters.
Hypothesis Testing: Using sample data to test hypotheses about population parameters.

1. Estimation

Estimation refers to the process of using sample data to estimate unknown population parameters. There are two types of estimates:

a) Point Estimation:

Point estimation involves using a single value (a "point") from the sample to estimate a population parameter.
For example:
- Sample Mean (x̄): Used as an estimate for the population mean (μ).
- Sample Proportion (p̂): Used as an estimate for the population proportion (p).
While point estimates provide a quick summary, they do not convey how much error there might be in the estimate.

b) Interval Estimation (Confidence Intervals):

An interval estimate provides a range of values within which the population parameter is likely to fall, along with a level of confidence.
A confidence interval for a population parameter (such as the mean) is typically given as:
$\text{Confidence Interval} = \hat{\theta} \pm z \times \text{Standard Error}$
Where:
- $\hat{\theta}$ is the point estimate (e.g., sample mean),
- $z$ is the z-value corresponding to the desired confidence level (for example, $z = 1.96$ for 95% confidence),
- Standard Error is the standard deviation of the sample estimate.
For instance, a 95% confidence interval for the population mean would suggest that if you repeated the sampling process many times, 95% of the intervals would contain the true population mean.

2. Hypothesis Testing

Hypothesis testing involves making inferences about a population by testing whether a certain hypothesis about the population parameter is likely to be true based on sample data. The key steps in hypothesis testing are:

a) Formulating Hypotheses:

Null Hypothesis (H₀): A statement of no effect or no difference. It is what the test is trying to disprove or reject. For example, H₀: "The mean is 50."
Alternative Hypothesis (H₁ or Ha): A statement that contradicts the null hypothesis. It represents what you are trying to prove. For example, Ha: "The mean is not equal to 50."

b) Test Statistic:

A test statistic is a numerical value that is calculated from the sample data. The test statistic is then compared to a critical value from a probability distribution to decide whether to reject the null hypothesis.
The formula for the test statistic depends on the type of test being conducted (e.g., t-test, z-test).

c) Significance Level (α):

The significance level (α) represents the probability of rejecting the null hypothesis when it is actually true (Type I error). Common values for α are 0.05, 0.01, and 0.10.
If the p-value (the probability of observing the test statistic or something more extreme) is less than α, we reject the null hypothesis.

d) Decision and Conclusion:

After calculating the test statistic and p-value, you compare the p-value to the significance level:
- If p-value < α, reject the null hypothesis.
- If p-value ≥ α, fail to reject the null hypothesis.
The decision is then made based on whether the data provides enough evidence to support the alternative hypothesis.

Example of Hypothesis Testing:

Suppose you want to test if the average height of students in a school is 170 cm (population mean). You collect a sample and calculate the sample mean height.
- Null hypothesis (H₀): The mean height is 170 cm ( $μ = 170$ ).
- Alternative hypothesis (H₁): The mean height is not 170 cm ( $μ ≠ 170$ ).
- Perform a t-test (since the population standard deviation is unknown), calculate the p-value, and compare it to the significance level (α = 0.05).

3. Types of Errors in Hypothesis Testing

There are two possible errors when performing hypothesis testing:

a) Type I Error (False Positive):

This occurs when you reject the null hypothesis when it is actually true.
The probability of committing a Type I error is denoted by α (significance level).

b) Type II Error (False Negative):

This occurs when you fail to reject the null hypothesis when the alternative hypothesis is actually true.
The probability of committing a Type II error is denoted by β, and the power of a test (1 - β) is the probability of correctly rejecting the null hypothesis when the alternative hypothesis is true.

4. P-Value and Confidence Intervals

The p-value is a measure of the strength of the evidence against the null hypothesis. A small p-value (typically less than 0.05) indicates strong evidence against H₀.
Confidence intervals and hypothesis testing are closely related:
- If the value of the population parameter (e.g., population mean) lies outside the confidence interval, it suggests that the null hypothesis can be rejected at the corresponding significance level.
- If the value of the population parameter lies inside the confidence interval, it suggests that we cannot reject the null hypothesis.

5. Common Statistical Tests in Inference

Z-Test: Used when the sample size is large (typically n > 30) or the population variance is known.
- Example: Test whether the average height of students is 170 cm.
T-Test: Used when the sample size is small (typically n ≤ 30) and the population variance is unknown.
- Example: Test whether the average weight of a sample of people is equal to 70 kg.
Chi-Square Test: Used to test relationships between categorical variables or to test if a sample follows a specific distribution.
- Example: Test if the distribution of a sample of people across different age groups is consistent with expected proportions.
ANOVA (Analysis of Variance): Used to compare the means of three or more groups.
- Example: Test whether the average test scores differ across three different teaching methods.

Conclusion

Statistical inference allows us to make informed decisions and predictions about populations based on sample data. Estimation provides a way to estimate population parameters, while hypothesis testing provides a framework for testing claims and hypotheses about those parameters. Understanding how to perform and interpret these techniques is essential for drawing valid conclusions from data.

Previous topic 1

Introduction: Statistics and Data Analysis

Next topic 3

Samples, Populations, and the Role of Probability

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

MS-251›Statistical Inference

Probability and StatisticsTopic 2 of 36

Statistical Inference

6 minread

1,093words

Intermediatelevel

Statistical Inference

The two main aspects of statistical inference are:

Estimation: Using sample data to estimate population parameters.
Hypothesis Testing: Using sample data to test hypotheses about population parameters.

1. Estimation

Estimation refers to the process of using sample data to estimate unknown population parameters. There are two types of estimates:

a) Point Estimation:

Point estimation involves using a single value (a "point") from the sample to estimate a population parameter.
For example:
- Sample Mean (x̄): Used as an estimate for the population mean (μ).
- Sample Proportion (p̂): Used as an estimate for the population proportion (p).
While point estimates provide a quick summary, they do not convey how much error there might be in the estimate.

b) Interval Estimation (Confidence Intervals):

An interval estimate provides a range of values within which the population parameter is likely to fall, along with a level of confidence.
A confidence interval for a population parameter (such as the mean) is typically given as:
$\text{Confidence Interval} = \hat{\theta} \pm z \times \text{Standard Error}$
Where:
- $\hat{\theta}$ is the point estimate (e.g., sample mean),
- $z$ is the z-value corresponding to the desired confidence level (for example, $z = 1.96$ for 95% confidence),
- Standard Error is the standard deviation of the sample estimate.
For instance, a 95% confidence interval for the population mean would suggest that if you repeated the sampling process many times, 95% of the intervals would contain the true population mean.

2. Hypothesis Testing

a) Formulating Hypotheses:

Null Hypothesis (H₀): A statement of no effect or no difference. It is what the test is trying to disprove or reject. For example, H₀: "The mean is 50."
Alternative Hypothesis (H₁ or Ha): A statement that contradicts the null hypothesis. It represents what you are trying to prove. For example, Ha: "The mean is not equal to 50."

b) Test Statistic:

A test statistic is a numerical value that is calculated from the sample data. The test statistic is then compared to a critical value from a probability distribution to decide whether to reject the null hypothesis.
The formula for the test statistic depends on the type of test being conducted (e.g., t-test, z-test).

c) Significance Level (α):

The significance level (α) represents the probability of rejecting the null hypothesis when it is actually true (Type I error). Common values for α are 0.05, 0.01, and 0.10.
If the p-value (the probability of observing the test statistic or something more extreme) is less than α, we reject the null hypothesis.

d) Decision and Conclusion:

After calculating the test statistic and p-value, you compare the p-value to the significance level:
- If p-value < α, reject the null hypothesis.
- If p-value ≥ α, fail to reject the null hypothesis.
The decision is then made based on whether the data provides enough evidence to support the alternative hypothesis.

Example of Hypothesis Testing:

Suppose you want to test if the average height of students in a school is 170 cm (population mean). You collect a sample and calculate the sample mean height.
- Null hypothesis (H₀): The mean height is 170 cm ( $μ = 170$ ).
- Alternative hypothesis (H₁): The mean height is not 170 cm ( $μ ≠ 170$ ).
- Perform a t-test (since the population standard deviation is unknown), calculate the p-value, and compare it to the significance level (α = 0.05).

3. Types of Errors in Hypothesis Testing

There are two possible errors when performing hypothesis testing:

a) Type I Error (False Positive):

This occurs when you reject the null hypothesis when it is actually true.
The probability of committing a Type I error is denoted by α (significance level).

b) Type II Error (False Negative):

This occurs when you fail to reject the null hypothesis when the alternative hypothesis is actually true.
The probability of committing a Type II error is denoted by β, and the power of a test (1 - β) is the probability of correctly rejecting the null hypothesis when the alternative hypothesis is true.

4. P-Value and Confidence Intervals

The p-value is a measure of the strength of the evidence against the null hypothesis. A small p-value (typically less than 0.05) indicates strong evidence against H₀.
Confidence intervals and hypothesis testing are closely related:
- If the value of the population parameter (e.g., population mean) lies outside the confidence interval, it suggests that the null hypothesis can be rejected at the corresponding significance level.
- If the value of the population parameter lies inside the confidence interval, it suggests that we cannot reject the null hypothesis.

5. Common Statistical Tests in Inference

Z-Test: Used when the sample size is large (typically n > 30) or the population variance is known.
- Example: Test whether the average height of students is 170 cm.
T-Test: Used when the sample size is small (typically n ≤ 30) and the population variance is unknown.
- Example: Test whether the average weight of a sample of people is equal to 70 kg.
Chi-Square Test: Used to test relationships between categorical variables or to test if a sample follows a specific distribution.
- Example: Test if the distribution of a sample of people across different age groups is consistent with expected proportions.
ANOVA (Analysis of Variance): Used to compare the means of three or more groups.
- Example: Test whether the average test scores differ across three different teaching methods.

Conclusion

Previous topic 1

Introduction: Statistics and Data Analysis

Next topic 3

Samples, Populations, and the Role of Probability

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.