The t-distribution, also known as Student's t-distribution, is a probability distribution that is used in statistics for estimating population parameters when the sample size is small and/or the population variance is unknown. It is particularly important in hypothesis testing and confidence interval estimation when dealing with small samples.
The t-distribution was first introduced by William Sealy Gosset under the pseudonym "Student" in 1908.
1. Characteristics of the t-Distribution
The t-distribution has several important properties that differentiate it from the normal distribution:
Key Features:
Shape:
The t-distribution is bell-shaped and symmetric around zero, just like the normal distribution.
However, the t-distribution has heavier tails than the normal distribution, meaning that there is a greater probability of extreme values.
As the sample size increases, the t-distribution approaches the normal distribution.
Mean and Variance:
The mean of the t-distribution is zero (μ=0).
The variance of the t-distribution is greater than 1. Specifically, the variance is ν−2ν, where ν is the degrees of freedom (discussed below). For small degrees of freedom, the variance can be much larger than 1.
Heavier Tails:
The t-distribution has heavier tails compared to the normal distribution. This means there is a higher probability of observing extreme values in a t-distribution. This property becomes more pronounced when the sample size is small.
Degrees of Freedom (df):
The shape of the t-distribution depends on the degrees of freedomν, which is usually associated with the sample size. For a single sample, the degrees of freedom are given by:
ν=n−1
Where:
n is the sample size.
As the degrees of freedom increase, the t-distribution approaches the standard normal distribution because larger sample sizes provide more information and reduce the uncertainty in estimating the population mean.
2. The t-Distribution Formula
The probability density function (PDF) of the t-distribution for a given value t and degrees of freedom ν is:
f(t)=νπΓ(2ν)Γ(2ν+1)(1+νt2)−2ν+1
Where:
Γ(x) is the Gamma function (a generalization of the factorial function),
t is the value for which the density is calculated,
ν is the degrees of freedom.
For most practical purposes, the t-distribution is looked up in t-tables or calculated using statistical software rather than directly using the PDF formula.
3. The Relationship Between t-Distribution and Normal Distribution
When the sample size is large (typically n>30), the t-distribution becomes very similar to the normal distribution. This is because the estimation of the population variance becomes more reliable as the sample size increases, and the sampling distribution of the sample mean approaches normality due to the Central Limit Theorem.
For small sample sizes (typically n≤30), the t-distribution is used because it accounts for the increased variability that comes from estimating the population standard deviation with a small sample.
4. When is the t-Distribution Used?
The t-distribution is typically used in the following scenarios:
Small Sample Sizes:
When the sample size is small (usually n≤30), the population variance σ2 is often unknown, and thus, we rely on the t-distribution to estimate the population mean or to conduct hypothesis testing.
Unknown Population Variance:
When the population variance is unknown and must be estimated from the sample, the t-distribution is used instead of the normal distribution. The sample variance s2 is used as an estimate of the population variance σ2.
5. Applications of the t-Distribution
The t-distribution is primarily used in the following types of statistical analyses:
a. t-Tests
The t-test is a hypothesis test used to determine whether there is a significant difference between the sample mean and the population mean, or between the means of two independent samples.
One-sample t-test:
Used to test whether the mean of a sample is significantly different from a known or hypothesized population mean.
Test statistic:
t=nsxˉ−μ0
Where:
xˉ is the sample mean,
μ0 is the population mean under the null hypothesis,
s is the sample standard deviation,
n is the sample size.
Two-sample t-test:
Used to compare the means of two independent groups.
Test statistic:
t=n1s12+n2s22xˉ1−xˉ2
Where:
xˉ1,xˉ2 are the sample means,
s12,s22 are the sample variances,
n1,n2 are the sample sizes.
Paired t-test:
Used when comparing two related samples, such as before-and-after measurements.
Test statistic is calculated using the differences between paired observations.
b. Confidence Intervals for the Mean
For small sample sizes and when the population variance is unknown, we can construct a confidence interval for the population mean using the t-distribution.
For a 95% confidence interval for the population mean μ, the formula is:
xˉ±tα/2×ns
Where:
tα/2 is the critical value of the t-distribution for the given confidence level and degrees of freedom (i.e., n−1),
xˉ is the sample mean,
s is the sample standard deviation,
n is the sample size.
c. Estimating the Population Variance
The t-distribution is also used in hypothesis testing and confidence intervals for estimating the population variance σ2, especially when the sample size is small.
6. Critical Values from the t-Distribution
The critical values of the t-distribution are used to conduct hypothesis tests and construct confidence intervals. These values depend on two factors:
Degrees of freedom (df): For a one-sample t-test, the degrees of freedom is df=n−1, where n is the sample size.
Significance level (α): The critical value corresponds to the desired confidence level or significance level. Commonly used values of α are 0.05 (for 95% confidence), 0.01 (for 99% confidence), etc.
These critical values can be found in t-tables or calculated using statistical software or calculators.
7. Summary of t-Distribution Key Points
The t-distribution is a family of probability distributions used when the sample size is small and the population variance is unknown.
It is similar to the normal distribution but with heavier tails, meaning that it accounts for more variability in smaller samples.
The t-distribution is parameterized by degrees of freedomν=n−1, where n is the sample size.
The t-distribution is used in t-tests, confidence intervals, and hypothesis testing when the population variance is unknown.
As the sample size increases, the t-distribution approaches the normal distribution, making the normal distribution a good approximation for large samples.
Understanding the t-distribution is crucial for conducting proper statistical analyses when working with small datasets or unknown population parameters.