Normal Forms of DBMS | Types, Examples, and Overview
Updated: December 31, 2024
55
Normalization Forms in DBMS
Did you know that normal forms in DBMS are essential for reducing redundancy and improving data integrity? If you’ve ever struggled with messy, error-prone databases, these principles offer a solution. By organizing data efficiently, normal forms ensure a structured and reliable database, much like organizing a chaotic library.
Normalization Database Example
- Removing duplicate data from student records.
- Splitting address into street, city, and ZIP code columns.
- Linking orders to customers through unique IDs.
- Creating a separate table for product categories.
- Eliminating redundant supplier information.
DBMS Normalization Forms
There are several Dbms normalization forms, such as the first normal form, the second normal form, the third normal form, the Boyce-Codd normal form, the fourth normal form, and the fifth normal form.
1st Normal Form (1NF)
1st Normal Form (1NF) ensures that a database table has only atomic (single) values in each column, meaning no multiple values or lists are stored in one cell. It also eliminates repeating groups, ensuring each row is unique. In simple terms, 1NF makes sure each cell in a table contains just one piece of information, and each record is distinct. This helps in organizing data properly and reduces redundancy.
Example of 1st Normal Form (1NF)
Unnormalized Table (Before 1NF)
Student ID | Name | Courses |
1 | John | Math, English |
2 | Emily | Math |
3 | Michael | History, Science |
Issues
- The “Courses” column contains multiple values (lists of courses).
- The table has repeating groups of data.
Normalized Table (After 1NF)
Student ID | Name | Course |
1 | John | Math |
1 | John | English |
2 | Emily | Math |
3 | Micheal | History |
3 | Micheal | Science |
Result
- Each column now has only a single value.
- There are no repeating groups in any cell.
- Each row contains unique information.
2nd Normal Form (2NF)
2nd Normal Form (2NF) builds on 1NF by ensuring that all non-key columns depend on the primary key, not just the part. This means that if a table has a composite primary key (more than one column), no non-key column should rely on just a part of that key. In simple terms, 2NF removes partial dependencies, making the table more organized. It helps reduce redundancy and improves data integrity by ensuring that each piece of information is stored in the right place.
Example of 2nd Normal Form (2NF)
Unnormalized Table (Before 2NF)
Student ID | Course ID | Instructor | Instructor Phone |
1 | 101 | Mr. Smith | 123-456-7890 |
1 | 102 | Ms. Johnson | 234-567-8901 |
2 | 101 | Mr. Smith | 123-456-7890 |
3 | 103 | Mr. Brown | 345-678-9012 |
Issues
- The primary key is a combination of StudentID and CourseID.
- The Instructor and InstructorPhone depend only on CourseID, not the entire primary key.
Normalized Table (After 2NF)
Stuent ID | Course ID |
1 | 101 |
1 | 102 |
2 | 101 |
3 | 103 |
Instructor Table
Course ID | Instructor | Instructor ID |
101 | Mr. Smith | 123-456-7890 |
102 | Ms. Johnson | 234-567-8901 |
103 | Mr. Brown | 345-678-9012 |
Result
- Now, Instructor and InstructorPhone depend only on CourseID and are stored in a separate table.
- The table is in 2NF, as all non-key attributes depend on the entire primary key.
3rd Normal Form (3NF)
3rd Normal Form (3NF) removes transitive dependencies, which occur when a non-key column depends on another non-key column. In simple terms, if one non-key attribute relies on another non-key attribute, it should be separated into different tables. 3NF ensures that each non-key column is directly dependent only on the primary key, not on other non-key columns. This reduces redundancy and improves data organization by keeping each piece of information in its proper place.
Example of 3rd Normal Form (3NF)
Unnormalized Table (Before 3NF)
Student ID | Course ID | Instructor | Instructor Phone | Instructor Email |
1 | 101 | Mr. Smith | 123-456-7890 | [email protected] |
1 | 102 | Mr. Joshon | 234-567-8901 | [email protected] |
2 | 101 | Mr. Smith | 123-456-7890 | [email protected] |
3 | 103 | Mr. Brown | 345-678-9012 | [email protected] |
Issues
- The InstructorPhone and InstructorEmail depend on the Instructor, which is not the primary key.
- These non-key attributes should be moved to a separate table to eliminate transitive dependency.
Normalized Tables (After 3NF)
Student ID | Course ID | Instructor |
1 | 101 | Mr. Smith |
1 | 102 | Ms. Johnson |
2 | 101 | Mr. Smith |
3 | 103 | Mr. Brown |
Result
- The InstructorPhone and InstructorEmail are now stored in a separate table linked to the Instructor.
- This eliminates transitive dependency and ensures that each non-key attribute depends only on the primary key, making the table compliant with 3NF.
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form (BCNF) is a stricter version of 3rd Normal Form (3NF). It ensures that every non-key attribute is fully dependent on the primary key. In simple terms, if a table has more than one candidate key (possible primary keys), every non-key attribute must depend on each of those keys. BCNF eliminates anomalies caused by certain types of dependencies that 3NF might not address, ensuring that the database structure is even more efficient and organized.
Issues
- The table has two candidate keys: StudentID + CourseID and Instructor + CourseID.
- InstructorPhone depends on the Instructor, but the Instructor is not the primary key in this case, violating BCNF.
Result
- Now, InstructorPhone is stored in a separate table, where Instructor is the primary key, ensuring that all non-key attributes depend only on the primary key.
- This eliminates the dependency on non-primary keys and brings the table into BCNF.
4th Normal Form (4NF)
4th Normal Form (4NF) ensures that a table has no multi-valued dependencies. A multi-valued dependency occurs when a single attribute in a table is related to multiple values of another attribute, but the two are independent. In simple terms, a table containing two or more independent multi-valued facts about an entity should be stored in separate tables. 4NF helps eliminate redundancy and maintain data integrity by ensuring each fact is stored in its own table.
Issues
- The table has a multi-valued dependency, where StudentID is related to multiple Courses and multiple Hobbies, but these are independent of each other.
- This leads to redundancy, as the same StudentID is repeated for each combination of Course and Hobby.
Result
- The StudentID is no longer repeated for each combination of Course and Hobby.
- The independent facts (courses and hobbies) are stored in separate tables, ensuring the table is in 4NF and eliminating multi-valued dependencies.
5th Normal Form (5NF)
The 5th Normal Form (5NF) is focused on eliminating any remaining redundancy caused by the combination of tables. It ensures that every piece of information is stored in its simplest form, without any unnecessary dependencies. In simple terms, 5NF requires that a table cannot be decomposed into smaller tables without losing information. This form deals with complex relationships where multiple keys or combinations of attributes lead to redundancy. 5NF ensures that the database is as efficient and logical as possible by avoiding such redundancies.
Issues
- The table contains redundant data because information about the Instructor and Textbook is repeated for each StudentID and Course combination.
- The Instructor and Textbook are independent of each other and the Course, but they are combined in the same table, leading to unnecessary repetition.
Result
- Now, each fact (Instructor and Textbook) is stored separately and linked to the course it belongs to.
- This removes redundancy and ensures that the table is in 5NF, where no further decomposition is possible without losing information.
What Causes the Need for Normalization?
Normalization is needed to address common problems that can occur when data is stored in an unorganized way:
- Insertion Anomaly: Occurs when it’s difficult to add new data because there is missing information. For example, to add a new student, you might need to provide information about a course or instructor, even if it’s not necessary for the new student.
- Deletion Anomaly: This happens when deleting data, which leads to the unintended loss of other important information. For example, if you delete a course that a student is enrolled in, you might also accidentally lose the instructor’s details for that course.
- Updation Anomaly: Takes place when updating data in one place but not in others, leading to inconsistencies. For instance, if an instructor’s phone number changes but isn’t updated across all records, different records will have conflicting information.
Difference Between Data Normalization and Denormalization
Normalization | Denormalization |
Reduces data redundancy. | Allows some data redundancy. |
Organizes data into smaller, related tables. | Combines data into fewer, larger tables. |
Improves data integrity. | Improves query performance. |
Helps prevent anomalies (insertion, deletion, update). | It may lead to some anomalies but is faster for reads. |
Data is stored in its simplest form. | Data is sometimes repeated for quicker access. |
Ideal for complex applications with frequent updates. | Ideal for read-heavy applications with few updates. |
Conclusion about Normalization of DBMS
DBMS normalization forms are essential for creating an organized and efficient database. By applying these forms, we can reduce duplicate data, improve data integrity, and make the database easier to maintain and update. Each level of normalization helps refine the structure, ensuring data is stored logically and without unnecessary repetition. Following these principles makes it easier to manage and retrieve accurate information, making the database more reliable and effective in the long run.
FAQS – Normal Forms Database
Why is normalization important in DBMS?
Normalization is important because it helps organize data in a way that reduces duplication, prevents errors, and makes the database easier to update and maintain. It ensures that related data is stored in separate tables, which helps in keeping the database clean and efficient.
Can a database be over-normalized?
Yes, a database can be over-normalized. Over-normalization occurs when too many tables are created, making the database unnecessarily complex. This can slow down performance and make it harder to work with, so it’s important to balance normalization with practicality.
Do we always need to normalize a database to the highest normal form (like BCNF or 4NF)?
Not always. While it’s good to normalize a database, it doesn’t always have to be taken to the highest normal form. Sometimes, reaching a lower normal form like 3NF is enough for most practical uses. The goal is to improve the structure without making it too complicated or affecting performance.
Can a relation in 2NF still have partial dependency?
No, a relation in 2NF should not have a partial dependency. In 2NF, all non-key attributes must depend on the whole primary key, not just part of it. If there’s still partial dependency, the relation hasn’t been fully normalized to 2NF.
What does BCNF stand for?
BCNF stands for Boyce-Codd Normal Form. It’s a higher level of database normalization that removes certain types of redundancy, ensuring that every determinant in the table is a candidate key.
What is PJNF in database normalization?
PJNF stands for Projection-Join Normal Form. It’s a special form of normalization where a relation is split into smaller relations, ensuring that the data can be reconstructed by joining the tables back together without losing any information. It’s a higher-level normal form than BCNF.
Please Write Your Comments