Sampling distributions are a fundamental concept in inferential statistics. They describe the probability distribution of a sample statistic (such as the sample mean, variance, or proportion) obtained from a random sample drawn from a population. Understanding sampling distributions is crucial for making inferences about the population based on sample data.
In essence, a sampling distribution is the distribution of a given statistic (such as the sample mean) computed from multiple samples taken from the same population. These distributions help to understand how sample statistics vary from one sample to another.
Key Concepts of Sampling Distributions
Sample Statistic:
A sample statistic is any summary measure calculated from a sample, such as the sample mean Xˉ, sample variance s2, or sample proportion p^.
Sampling Distribution:
The sampling distribution of a statistic is the probability distribution of that statistic based on all possible random samples of a specific size n from a population. It tells us how the statistic behaves across many different samples.
Standard Error:
The standard error (SE) of a statistic is a measure of how much the statistic (like the sample mean) varies from sample to sample. It is the standard deviation of the sampling distribution.
For example, the standard error of the sample mean is:
SEXˉ=nσ
Where σ is the population standard deviation and n is the sample size.
Central Limit Theorem (CLT):
The Central Limit Theorem (CLT) is one of the most important results in statistics. It states that for a large enough sample size, the sampling distribution of the sample mean Xˉ (or any other statistic) will be approximately normally distributed, regardless of the shape of the population distribution. This is true for any population distribution, as long as the sample size is sufficiently large (typically n≥30).
The CLT implies that:
The distribution of the sample mean becomes approximately normal as the sample size increases.
The mean of the sample means is equal to the population mean μ.
The standard error of the sample mean decreases as the sample size increases, making the sample mean more reliable.
Xˉ∼N(μ,nσ2)
Types of Sampling Distributions
Here are some common sampling distributions that are fundamental to understanding statistical inference:
1. Sampling Distribution of the Sample Mean
The sampling distribution of the sample meanXˉ refers to the distribution of the mean of a sample taken from a population.
Mean of the Sampling Distribution:
The mean of the sample mean is equal to the population mean μ. That is:
E[Xˉ]=μ
Variance of the Sampling Distribution:
The variance of the sample mean is given by:
Var(Xˉ)=nσ2
Where σ2 is the population variance and n is the sample size.
Standard Error of the Sample Mean:
The standard error of the sample mean (SE) is:
SEXˉ=nσ
As the sample size increases, the standard error decreases, making the sample mean a more precise estimate of the population mean.
Central Limit Theorem:
The Central Limit Theorem (CLT) states that, regardless of the shape of the population distribution, the sampling distribution of the sample mean will be approximately normal for sufficiently large sample sizes (usually n≥30).
2. Sampling Distribution of the Sample Proportion
The sample proportion is the proportion of successes in a sample. If p is the population proportion of successes, the sample proportion is denoted by p^, and it is defined as:
p^=nNumber of successes in the sample
Mean of the Sampling Distribution of p^:
The mean of the sample proportion is equal to the population proportion p:
E[p^]=p
Variance of the Sampling Distribution of p^:
The variance of p^ is given by:
Var(p^)=np(1−p)
Where n is the sample size.
Standard Error of the Sample Proportion:
The standard error of the sample proportion is:
SEp^=np(1−p)
As the sample size increases, the standard error of p^ decreases, making p^ a more reliable estimate of p.
Central Limit Theorem:
If the sample size n is large enough, the sampling distribution of p^ will be approximately normal with mean p and standard deviation np(1−p), provided that the conditions np≥10 and n(1−p)≥10 are met. This is the rule of thumb for approximating p^'s distribution as normal.
3. Sampling Distribution of the Sample Variance
The sampling distribution of the sample variance is the distribution of the variance computed from a sample. If the population follows a normal distribution with variance σ2, the sample variance s2 follows a chi-square distribution.
Mean of the Sampling Distribution of s2:
The mean of the sample variance is equal to the population variance:
E[s2]=σ2
Variance of the Sampling Distribution of s2:
The variance of the sample variance is:
Var(s2)=n−12σ4
Chi-Square Distribution:
The sample variance follows a chi-square distribution with n−1 degrees of freedom:
σ2(n−1)s2∼χn−12
This distribution is important for statistical tests that involve variance or standard deviation.
4. T-Distribution
When estimating the population mean μ using a small sample size (typically n<30) and the population variance σ2 is unknown, the t-distribution is used instead of the normal distribution. The t-distribution is similar to the normal distribution but has heavier tails, which account for the increased uncertainty when using small sample sizes.
Sampling Distribution of the Sample Mean with Unknown Variance:
When the population variance σ2 is unknown, the sampling distribution of the sample mean follows a t-distribution:
t=nsXˉ−μ
Where s is the sample standard deviation, and the t-distribution has n−1 degrees of freedom.
Summary
Sampling distributions describe how sample statistics vary from sample to sample. They are fundamental for making inferences about the population.
The sampling distribution of the sample mean is approximately normal, regardless of the population distribution, for large enough sample sizes, due to the Central Limit Theorem.
The sample proportion also follows a normal distribution for large sample sizes, under the condition that the expected number of successes and failures are sufficiently large.
The sample variance follows a chi-square distribution for normally distributed populations.
For small sample sizes and unknown population variance, the t-distribution is used to model the sampling distribution of the sample mean.
Sampling distributions allow statisticians to make precise probabilistic statements about how close a sample statistic is to the true population parameter, which is essential for hypothesis testing and confidence interval estimation.