1. What is Regression Analysis?
Definition:
Regression analysis is a statistical technique used to study the relationship between a dependent variable (response) and one or more independent variables (predictors).
- It helps in predicting the value of the dependent variable based on known values of independent variables.
- It also quantifies the strength and nature of relationships between variables.
2. Key Terms
| Term |
Meaning |
| Dependent Variable (Y) |
The variable we want to predict or explain |
| Independent Variable (X) |
The variable(s) used to predict Y |
| Regression Coefficient |
Measures the effect of X on Y |
| Intercept (β0) |
The value of Y when X = 0 |
| Slope (β1) |
The change in Y for a one-unit change in X |
| Residual |
The difference between observed and predicted Y values |
| Regression Line / Equation |
Mathematical representation of the relationship: Y=β0+β1X+ε |
3. Types of Regression
A. Simple Linear Regression
- Involves one independent variable and one dependent variable.
- Model:
Y=β0+β1X+ε
- Goal: Estimate β0 (intercept) and β1 (slope) to best fit the data.
B. Multiple Linear Regression
- Involves two or more independent variables predicting a dependent variable.
- Model:
Y=β0+β1X1+β2X2+⋯+βkXk+ε
- Allows for more accurate predictions and understanding of combined effects of variables.
4. Assumptions of Linear Regression
- Linearity – Relationship between X and Y is linear.
- Independence – Observations are independent.
- Homoscedasticity – Constant variance of residuals.
- Normality – Residuals are normally distributed.
- No multicollinearity (for multiple regression) – Independent variables are not highly correlated.
5. Purpose of Regression Analysis
-
Prediction:
- Predicting future outcomes based on independent variables.
- Example: Predicting sales based on advertising expenditure.
-
Estimation:
- Estimating the strength of relationships between variables.
- Example: How much does temperature affect ice cream sales?
-
Hypothesis Testing:
- Testing whether independent variables significantly influence the dependent variable.
6. Goodness of Fit
- R-squared (R2): Proportion of variation in Y explained by X.
- Adjusted R-squared: Adjusted for the number of predictors, used in multiple regression.
- Standard Error of Estimate: Measures the average distance of observed values from the regression line.
7. Example: Simple Linear Regression
Suppose we want to predict students’ test scores (Y) based on hours studied (X):
-
Sample data:
- Hours studied: 2, 4, 6, 8
- Scores: 50, 60, 65, 80
-
Regression equation:
Y^=40+5X
-
Interpretation:
- Intercept (40): Predicted score if 0 hours studied.
- Slope (5): Each additional hour of study increases score by 5 points.
8. Summary
- Regression analysis helps in modeling relationships, prediction, and decision-making.
- Simple regression → one predictor, multiple regression → several predictors.
- Assumptions must be checked to ensure validity of results.
- Key outputs: Regression coefficients, R-squared, residuals, significance of predictors.