MS-251›Fundamental Sampling Distributions

Probability and StatisticsTopic 21 of 36

Fundamental Sampling Distributions

9 minread

1,571words

Intermediatelevel

Fundamental Sampling Distributions

Sampling distributions are a fundamental concept in inferential statistics. They describe the probability distribution of a sample statistic (such as the sample mean, variance, or proportion) obtained from a random sample drawn from a population. Understanding sampling distributions is crucial for making inferences about the population based on sample data.

In essence, a sampling distribution is the distribution of a given statistic (such as the sample mean) computed from multiple samples taken from the same population. These distributions help to understand how sample statistics vary from one sample to another.

Key Concepts of Sampling Distributions

Sample Statistic:
- A sample statistic is any summary measure calculated from a sample, such as the sample mean $\bar{X}$ , sample variance $s^2$ , or sample proportion $\hat{p}$ .
Sampling Distribution:
- The sampling distribution of a statistic is the probability distribution of that statistic based on all possible random samples of a specific size $n$ from a population. It tells us how the statistic behaves across many different samples.
Standard Error:
- The standard error (SE) of a statistic is a measure of how much the statistic (like the sample mean) varies from sample to sample. It is the standard deviation of the sampling distribution.
- For example, the standard error of the sample mean is: $SE_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$ Where $\sigma$ is the population standard deviation and $n$ is the sample size.
Central Limit Theorem (CLT):
- The Central Limit Theorem (CLT) is one of the most important results in statistics. It states that for a large enough sample size, the sampling distribution of the sample mean $\bar{X}$ (or any other statistic) will be approximately normally distributed, regardless of the shape of the population distribution. This is true for any population distribution, as long as the sample size is sufficiently large (typically $n \geq 30$ ).
The CLT implies that:
- The distribution of the sample mean becomes approximately normal as the sample size increases.
- The mean of the sample means is equal to the population mean $\mu$ .
- The standard error of the sample mean decreases as the sample size increases, making the sample mean more reliable.
$\bar{X} \sim N \left( \mu, \frac{\sigma^2}{n} \right)$

Types of Sampling Distributions

Here are some common sampling distributions that are fundamental to understanding statistical inference:

1. Sampling Distribution of the Sample Mean

The sampling distribution of the sample mean $\bar{X}$ refers to the distribution of the mean of a sample taken from a population.

Mean of the Sampling Distribution:
- The mean of the sample mean is equal to the population mean $\mu$ . That is:
$E[\bar{X}] = \mu$
Variance of the Sampling Distribution:
- The variance of the sample mean is given by:
$\text{Var}(\bar{X}) = \frac{\sigma^2}{n}$
Where $\sigma^2$ is the population variance and $n$ is the sample size.
Standard Error of the Sample Mean:
- The standard error of the sample mean (SE) is:
$SE_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$
As the sample size increases, the standard error decreases, making the sample mean a more precise estimate of the population mean.
Central Limit Theorem:
- The Central Limit Theorem (CLT) states that, regardless of the shape of the population distribution, the sampling distribution of the sample mean will be approximately normal for sufficiently large sample sizes (usually $n \geq 30$ ).

2. Sampling Distribution of the Sample Proportion

The sample proportion is the proportion of successes in a sample. If $p$ is the population proportion of successes, the sample proportion is denoted by $\hat{p}$ , and it is defined as:

\hat{p} = \frac{\text{Number of successes in the sample}}{n}

Mean of the Sampling Distribution of $\hat{p}$ :
- The mean of the sample proportion is equal to the population proportion $p$ :
$E[\hat{p}] = p$
Variance of the Sampling Distribution of $\hat{p}$ :
- The variance of $\hat{p}$ is given by:
$\text{Var}(\hat{p}) = \frac{p(1 - p)}{n}$
Where $n$ is the sample size.
Standard Error of the Sample Proportion:
- The standard error of the sample proportion is:
$SE_{\hat{p}} = \sqrt{\frac{p(1 - p)}{n}}$
As the sample size increases, the standard error of $\hat{p}$ decreases, making $\hat{p}$ a more reliable estimate of $p$ .
Central Limit Theorem:
- If the sample size $n$ is large enough, the sampling distribution of $\hat{p}$ will be approximately normal with mean $p$ and standard deviation $\sqrt{\frac{p(1 - p)}{n}}$ , provided that the conditions $np \geq 10$ and $n(1 - p) \geq 10$ are met. This is the rule of thumb for approximating $\hat{p}$ 's distribution as normal.

3. Sampling Distribution of the Sample Variance

The sampling distribution of the sample variance is the distribution of the variance computed from a sample. If the population follows a normal distribution with variance $\sigma^2$ , the sample variance $s^2$ follows a chi-square distribution.

Mean of the Sampling Distribution of $s^2$ :
- The mean of the sample variance is equal to the population variance:
$E[s^2] = \sigma^2$
Variance of the Sampling Distribution of $s^2$ :
- The variance of the sample variance is:
$\text{Var}(s^2) = \frac{2 \sigma^4}{n - 1}$
Chi-Square Distribution:
- The sample variance follows a chi-square distribution with $n - 1$ degrees of freedom:
$\frac{(n - 1) s^2}{\sigma^2} \sim \chi^2_{n-1}$
This distribution is important for statistical tests that involve variance or standard deviation.

4. T-Distribution

When estimating the population mean $\mu$ using a small sample size (typically $n < 30$ ) and the population variance $\sigma^2$ is unknown, the t-distribution is used instead of the normal distribution. The t-distribution is similar to the normal distribution but has heavier tails, which account for the increased uncertainty when using small sample sizes.

Sampling Distribution of the Sample Mean with Unknown Variance:
- When the population variance $\sigma^2$ is unknown, the sampling distribution of the sample mean follows a t-distribution:
$t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}}$ Where $s$ $s$ is the sample standard deviation, and the t-distribution has $n - 1$ $n - 1$ degrees of freedom.

Summary

Sampling distributions describe how sample statistics vary from sample to sample. They are fundamental for making inferences about the population.
The sampling distribution of the sample mean is approximately normal, regardless of the population distribution, for large enough sample sizes, due to the Central Limit Theorem.
The sample proportion also follows a normal distribution for large sample sizes, under the condition that the expected number of successes and failures are sufficiently large.
The sample variance follows a chi-square distribution for normally distributed populations.
For small sample sizes and unknown population variance, the t-distribution is used to model the sampling distribution of the sample mean.

Sampling distributions allow statisticians to make precise probabilistic statements about how close a sample statistic is to the true population parameter, which is essential for hypothesis testing and confidence interval estimation.

Previous topic 20

Continuous Probability Distributions

Next topic 22

Sampling Distributions and Data Descriptions

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

MS-251›Fundamental Sampling Distributions

Probability and StatisticsTopic 21 of 36

Fundamental Sampling Distributions

9 minread

1,571words

Intermediatelevel

Fundamental Sampling Distributions

Key Concepts of Sampling Distributions

Sample Statistic:
- A sample statistic is any summary measure calculated from a sample, such as the sample mean $\bar{X}$ , sample variance $s^2$ , or sample proportion $\hat{p}$ .
Sampling Distribution:
- The sampling distribution of a statistic is the probability distribution of that statistic based on all possible random samples of a specific size $n$ from a population. It tells us how the statistic behaves across many different samples.
Standard Error:
- The standard error (SE) of a statistic is a measure of how much the statistic (like the sample mean) varies from sample to sample. It is the standard deviation of the sampling distribution.
- For example, the standard error of the sample mean is: $SE_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$ Where $\sigma$ is the population standard deviation and $n$ is the sample size.
Central Limit Theorem (CLT):
- The Central Limit Theorem (CLT) is one of the most important results in statistics. It states that for a large enough sample size, the sampling distribution of the sample mean $\bar{X}$ (or any other statistic) will be approximately normally distributed, regardless of the shape of the population distribution. This is true for any population distribution, as long as the sample size is sufficiently large (typically $n \geq 30$ ).
The CLT implies that:
- The distribution of the sample mean becomes approximately normal as the sample size increases.
- The mean of the sample means is equal to the population mean $\mu$ .
- The standard error of the sample mean decreases as the sample size increases, making the sample mean more reliable.
$\bar{X} \sim N \left( \mu, \frac{\sigma^2}{n} \right)$

Types of Sampling Distributions

Here are some common sampling distributions that are fundamental to understanding statistical inference:

1. Sampling Distribution of the Sample Mean

The sampling distribution of the sample mean $\bar{X}$ refers to the distribution of the mean of a sample taken from a population.

Mean of the Sampling Distribution:
- The mean of the sample mean is equal to the population mean $\mu$ . That is:
$E[\bar{X}] = \mu$
Variance of the Sampling Distribution:
- The variance of the sample mean is given by:
$\text{Var}(\bar{X}) = \frac{\sigma^2}{n}$
Where $\sigma^2$ is the population variance and $n$ is the sample size.
Standard Error of the Sample Mean:
- The standard error of the sample mean (SE) is:
$SE_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$
As the sample size increases, the standard error decreases, making the sample mean a more precise estimate of the population mean.
Central Limit Theorem:
- The Central Limit Theorem (CLT) states that, regardless of the shape of the population distribution, the sampling distribution of the sample mean will be approximately normal for sufficiently large sample sizes (usually $n \geq 30$ ).

2. Sampling Distribution of the Sample Proportion

The sample proportion is the proportion of successes in a sample. If $p$ is the population proportion of successes, the sample proportion is denoted by $\hat{p}$ , and it is defined as:

\hat{p} = \frac{\text{Number of successes in the sample}}{n}

Mean of the Sampling Distribution of $\hat{p}$ :
- The mean of the sample proportion is equal to the population proportion $p$ :
$E[\hat{p}] = p$
Variance of the Sampling Distribution of $\hat{p}$ :
- The variance of $\hat{p}$ is given by:
$\text{Var}(\hat{p}) = \frac{p(1 - p)}{n}$
Where $n$ is the sample size.
Standard Error of the Sample Proportion:
- The standard error of the sample proportion is:
$SE_{\hat{p}} = \sqrt{\frac{p(1 - p)}{n}}$
As the sample size increases, the standard error of $\hat{p}$ decreases, making $\hat{p}$ a more reliable estimate of $p$ .
Central Limit Theorem:
- If the sample size $n$ is large enough, the sampling distribution of $\hat{p}$ will be approximately normal with mean $p$ and standard deviation $\sqrt{\frac{p(1 - p)}{n}}$ , provided that the conditions $np \geq 10$ and $n(1 - p) \geq 10$ are met. This is the rule of thumb for approximating $\hat{p}$ 's distribution as normal.

3. Sampling Distribution of the Sample Variance

Mean of the Sampling Distribution of $s^2$ :
- The mean of the sample variance is equal to the population variance:
$E[s^2] = \sigma^2$
Variance of the Sampling Distribution of $s^2$ :
- The variance of the sample variance is:
$\text{Var}(s^2) = \frac{2 \sigma^4}{n - 1}$
Chi-Square Distribution:
- The sample variance follows a chi-square distribution with $n - 1$ degrees of freedom:
$\frac{(n - 1) s^2}{\sigma^2} \sim \chi^2_{n-1}$
This distribution is important for statistical tests that involve variance or standard deviation.

4. T-Distribution

Sampling Distribution of the Sample Mean with Unknown Variance:
- When the population variance $\sigma^2$ is unknown, the sampling distribution of the sample mean follows a t-distribution:
$t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}}$ Where $s$ $s$ is the sample standard deviation, and the t-distribution has $n - 1$ $n - 1$ degrees of freedom.

Summary

Sampling distributions describe how sample statistics vary from sample to sample. They are fundamental for making inferences about the population.
The sampling distribution of the sample mean is approximately normal, regardless of the population distribution, for large enough sample sizes, due to the Central Limit Theorem.
The sample proportion also follows a normal distribution for large sample sizes, under the condition that the expected number of successes and failures are sufficiently large.
The sample variance follows a chi-square distribution for normally distributed populations.
For small sample sizes and unknown population variance, the t-distribution is used to model the sampling distribution of the sample mean.

Previous topic 20

Continuous Probability Distributions

Next topic 22

Sampling Distributions and Data Descriptions

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.