MS-251›Variance and Covariance of Random Variables
Probability and StatisticsTopic 16 of 36
Variance and Covariance of Random Variables
13 minread
2,145words
Intermediatelevel
Variance and Covariance of Random Variables
Variance and covariance are important concepts in probability theory and statistics that measure the spread (or variability) of random variables and the relationship between two random variables, respectively. Let's go over each concept in detail.
1. Variance of a Random Variable
Definition
The variance of a random variable measures the spread or dispersion of its values around the expected value (mean). It quantifies how much the values of the random variable deviate, on average, from the expected value.
The variance of a random variable X, denoted by Var(X), is defined as the expected value of the squared deviation from the mean E[X]:
Var(X)=E[(X−E[X])2]
Alternatively, the variance can be computed using the following formula:
Var(X)=E[X2]−(E[X])2
This formula is useful because it allows you to compute the variance by finding the expected value of X2 and subtracting the square of the expected value of X.
Interpretation
A higher variance indicates that the values of X are spread out over a larger range, meaning the data is more dispersed.
A lower variance indicates that the values of X are clustered more closely around the expected value, meaning the data is less dispersed.
Example
Let’s say you have a random variable X with possible outcomes 1,2,3, and corresponding probabilities P(X=1)=0.2, P(X=2)=0.5, and P(X=3)=0.3.
The standard deviation is the square root of the variance and provides a measure of the spread of the random variable in the same units as the random variable itself. It is given by:
SD(X)=Var(X)
Standard deviation is often easier to interpret because it is in the same units as the data.
In our example, since Var(X)=0.49, the standard deviation is:
SD(X)=0.49=0.7
3. Covariance of Two Random Variables
Definition
The covariance of two random variables X and Y measures the degree to which the two variables change together. If X and Y tend to increase or decrease together, the covariance will be positive. If one tends to increase while the other decreases, the covariance will be negative. A covariance of zero indicates that the two variables are linearly independent.
The covariance between two random variables X and Y, denoted Cov(X,Y), is defined as:
Cov(X,Y)=E[(X−E[X])(Y−E[Y])]
Alternatively, the covariance can be calculated using the following formula:
Cov(X,Y)=E[XY]−E[X]E[Y]
Interpretation
A positive covariance indicates that as X increases, Y tends to increase as well (and vice versa).
A negative covariance indicates that as X increases, Y tends to decrease.
A covariance of zero indicates that there is no linear relationship between the two variables.
Example
Let’s say we have two random variables X and Y with the following values and probabilities:
X
Y
P(X,Y)
1
2
0.1
1
3
0.4
2
2
0.3
2
4
0.2
First, calculate the expected values E[X] and E[Y]:
Thus, Cov(X,Y)=0, meaning that there is no linear relationship between X and Y in this example.
4. Properties of Covariance
Covariance has the following important properties:
Symmetry:
Cov(X,Y)=Cov(Y,X)
Linear Scaling: If X and Y are random variables and a and b are constants, then:
Cov(aX+b,Y)=a⋅Cov(X,Y)
Covariance of a Random Variable with Itself: The covariance of a random variable with itself is simply its variance:
Cov(X,X)=Var(X)
5. Correlation Coefficient
The correlation coefficient is a normalized measure of the relationship between two random variables. It is defined as:
ρ(X,Y)=SD(X)⋅SD(Y)Cov(X,Y)
The correlation coefficient ranges from -1 to 1:
A value of 1 indicates a perfect positive linear relationship.
A value of -1 indicates a perfect negative linear relationship.
A value of 0 indicates no linear relationship.
Summary
Variance measures the spread of a random variable around its expected value.
Covariance measures the relationship between two random variables—whether they tend to increase or decrease together.
Standard Deviation is the square root of the variance and is a measure of the spread in the same units as the data.
The correlation coefficient normalizes the covariance to provide a standardized measure of the strength and direction of the linear relationship between two variables.
These concepts are fundamental to understanding how random variables behave, both individually (through variance) and in relation to one another (through covariance).