Multiple Linear Regression (MLR) is an extension of simple linear regression that allows us to model the relationship between a dependent variable (also called the response variable) and two or more independent variables (predictors or explanatory variables). It assumes that the relationship between the dependent variable and each independent variable is linear, but there can be multiple predictors involved.
The primary goal of multiple linear regression is to find the best-fitting linear relationship between the dependent variable and the independent variables. It is widely used in many fields, such as economics, engineering, and social sciences.
The general form of the multiple linear regression model is:
Where:
For the multiple linear regression model to provide reliable estimates, several assumptions must be met:
The coefficients are estimated using the method of Ordinary Least Squares (OLS), which minimizes the sum of squared residuals. Mathematically, the goal is to find the coefficients that minimize:
Where represents the predicted values of .
Once the model is fit, it is important to check the assumptions to validate the model's findings:
Linearity: You can plot the residuals versus the fitted values to check if there is a linear pattern. A linear pattern indicates that the linearity assumption is met.
Independence: If residuals are correlated, the assumption of independence is violated. This is often checked using the Durbin-Watson statistic.
Homoscedasticity: Plotting residuals versus fitted values should show no clear pattern. If the spread of residuals increases or decreases as fitted values increase, this indicates heteroscedasticity.
Normality of Residuals: This can be assessed using a Q-Q plot or histogram of residuals. If the residuals are normally distributed, the points will lie along a straight line in the Q-Q plot.
Multicollinearity: This occurs when two or more independent variables are highly correlated with each other, making it difficult to isolate the individual effect of each variable on the dependent variable. You can check for multicollinearity using the Variance Inflation Factor (VIF).
While multiple linear regression assumes a linear relationship between the dependent and independent variables, some relationships are inherently nonlinear. In such cases, nonlinear regression models are used.
Nonlinear regression is used when the relationship between the dependent and independent variables cannot be described by a straight line but instead follows some nonlinear function (such as exponential, logarithmic, power, or polynomial).
Exponential Regression Model: The dependent variable changes exponentially with respect to the independent variable :
Here, is the base of the natural logarithm.
Logarithmic Regression Model: The relationship between and follows a logarithmic function:
This is useful when growth or decay is observed, and changes in are proportional to the logarithm of .
Power Law Model: The dependent variable follows a power of the independent variable :
This is common in situations where relationships are proportional to a power of the independent variable, such as certain physical laws.
Polynomial Regression: A more flexible nonlinear model where the relationship between and is modeled as a polynomial of degree :
This allows for modeling curvatures in the relationship between the variables, but care must be taken to avoid overfitting, especially with higher-degree polynomials.
Logistic Regression (for binary outcomes): When the dependent variable is binary (e.g., success/failure, yes/no), a logistic regression model is used, which models the probability of success as a nonlinear function of the predictors. It is defined as:
This model is often used for classification problems.
Fitting nonlinear regression models typically involves nonlinear optimization techniques since the relationship between the dependent and independent variables is not linear. These techniques include methods like Gauss-Newton, Levenberg-Marquardt, and gradient descent. These methods iteratively adjust the parameters () to minimize the sum of squared residuals.
Unlike linear regression, where the coefficients can be directly computed using matrix algebra (Ordinary Least Squares), nonlinear regression often requires computational methods to estimate the coefficients.
| Aspect | Linear Regression | Nonlinear Regression |
|---|---|---|
| Relationship | Linear between and | Nonlinear between and |
| Model Form | , , etc. | |
| Fitting Method | Ordinary Least Squares (OLS) | Nonlinear optimization (e.g., Gauss-Newton) |
| Assumptions | Linearity, homoscedasticity, independence, etc. | More flexible; assumptions depend on the specific model |
| Interpretation | Coefficients represent the change in per unit change in | Coefficients depend on the form of the nonlinear function |
Both linear and nonlinear regression are powerful tools for modeling relationships in data, and choosing between them depends on the nature of the data and the relationship between the variables.
Open this section to load past papers