Normalization is a process used in database design to organize the data in a way that minimizes redundancy and dependency. The goal of normalization is to reduce the possibility of data anomalies and ensure that the database structure is efficient, logical, and consistent.
Normalization typically involves decomposing a large, complex table into smaller, simpler tables, and establishing relationships between them. This helps improve the integrity of the data, making it easier to maintain and update.
Normalization follows a series of stages called normal forms. There are several normal forms, and each normal form builds on the previous one, with the primary goal of reducing redundancy and preventing anomalies (insert, update, delete anomalies).
Normalization proceeds through a series of Normal Forms (NF). Each normal form addresses specific types of redundancy or anomaly.
A relation is in 1NF if:
Key points of 1NF:
| StudentID | Name | Phone Numbers |
|---|---|---|
| 1 | Alice | 12345, 67890 |
| 2 | Bob | 98765 |
| StudentID | Name | Phone Number |
|---|---|---|
| 1 | Alice | 12345 |
| 1 | Alice | 67890 |
| 2 | Bob | 98765 |
A relation is in 2NF if:
Key points of 2NF:
Consider a table storing information about students and their courses:
| StudentID | CourseID | Instructor | InstructorPhone |
|---|---|---|---|
| 1 | C101 | Dr. Smith | 12345 |
| 1 | C102 | Dr. Johnson | 67890 |
| 2 | C101 | Dr. Smith | 12345 |
Here, the primary key is a composite of StudentID and CourseID. The problem is that Instructor and InstructorPhone depend only on CourseID, not the entire composite key.
Student-Course Table:
| StudentID | CourseID |
|---|---|
| 1 | C101 |
| 1 | C102 |
| 2 | C101 |
Instructor Table:
| CourseID | Instructor | InstructorPhone |
|---|---|---|
| C101 | Dr. Smith | 12345 |
| C102 | Dr. Johnson | 67890 |
A relation is in 3NF if:
Key points of 3NF:
Consider a table that stores information about employees and their departments:
| EmployeeID | Name | DeptID | DeptName | DeptLocation |
|---|---|---|---|---|
| 1 | Alice | D1 | HR | New York |
| 2 | Bob | D2 | IT | San Francisco |
| 3 | Charlie | D1 | HR | New York |
Here, DeptName and DeptLocation are dependent on DeptID, not directly on EmployeeID. Thus, we have a transitive dependency (Employee → DeptID → DeptName, DeptLocation).
Employee Table:
| EmployeeID | Name | DeptID |
|---|---|---|
| 1 | Alice | D1 |
| 2 | Bob | D2 |
| 3 | Charlie | D1 |
Department Table:
| DeptID | DeptName | DeptLocation |
|---|---|---|
| D1 | HR | New York |
| D2 | IT | San Francisco |
A relation is in BCNF if:
Key points of BCNF:
Consider a table where StudentID and CourseID are both keys:
| StudentID | CourseID | Instructor | InstructorPhone |
|---|---|---|---|
| 1 | C101 | Dr. Smith | 12345 |
| 1 | C102 | Dr. Johnson | 67890 |
| 2 | C101 | Dr. Smith | 12345 |
Here, Instructor and InstructorPhone depend on CourseID, but CourseID is not a superkey.
Student-Course Table:
| StudentID | CourseID |
|---|---|
| 1 | C101 |
| 1 | C102 |
| 2 | C101 |
Course-Instructor Table:
| CourseID | Instructor | InstructorPhone |
|---|---|---|
| C101 | Dr. Smith | 12345 |
| C102 | Dr. Johnson | 67890 |
CourseID) is a superkey.A relation is in 4NF if:
Key points of 4NF:
| EmployeeID | Skill | Language |
|---|---|---|
| 1 | Programming | English |
| 1 | Management | Spanish |
Here, an employee can have multiple skills and multiple languages. This is a case of multi-valued dependency.
Employee-Skill Table:
| EmployeeID | Skill |
|---|---|
| 1 | Programming |
| 1 | Management |
Employee-Language Table:
| EmployeeID | Language |
|---|---|
| 1 | English |
| 1 | Spanish |
Normalization ensures that the database schema is clean, consistent, and free from redundancy. The trade-off is that normalization can lead to more tables and more complex queries due to the need for joins.
Open this section to load past papers