Normalization is the process or rules for minimizing data redundancy (duplicates or repetition) in relation or set of relations.
It divides larger tables into smaller tables and links or connects them by relationships.
Redundancy creates the cause of insertion, deletion, and update anomalies.
• Insert anomalies − Insert data in a record that does not exist at all.
• Deletion anomalies − Delete a record, but parts of it were undeleted because the data is also saved somewhere else.
• Update anomalies – due to scattered copies of data in the database, when updating a data item in the database but its old values are present at several places in the database, so old and new values find at a time in a database which is wrong and database in an inconsistent state.
Normalization is a method to remove all these anomalies and bring the database to a consistent state.
This process divides into various Normal forms.
Types of normal forms:
I-NF
II- NF
III- NF
BCNF
IV-NF
V- NF
First Normal Form (1NF)
o A relation must contain an atomic value.
o Table attribute cannot hold multiple values and must hold only a single-valued attribute.
o First normal form does not allow the multi-valued attribute, composite attribute, and their combinations.
Example: Relation EMPLOYEE is unnormalized form
EMPLOYEE table:
EMP_ID |
EMP_NAME |
EMP_PHONE |
EMP_STATE |
14 |
John |
7272826385, |
UP |
20 |
Harry |
8574783832 |
Bihar |
12 |
Sam |
7390372389, |
Punjab |
Normalization form of EMPLOYEE into 1NF:
EMP_ID |
EMP_NAME |
EMP_PHONE |
EMP_STATE |
14 |
John |
7272826385 |
UP |
14 |
John |
9064738238 |
UP |
20 |
Harry |
8574783832 |
Bihar |
12 |
Sam |
7390372389 |
Punjab |
12 |
Sam |
8589830302 |
Punjab |
============================================================
Second Normal Form (2NF)
- Relational must be in 1NF.
- All non-key attributes are fully functional and dependent on the primary key.
- Prime attribute −
An attribute, which is a part of the candidate-key, is known as a prime
attribute.
- Non-prime attribute −
An attribute, which is not a part of the prime key, is known as non-prime
attribute.
Example:- Student_Project
(stu_id, proj_id, stu_name, proj_name)
There are two prime key attributes Stu_ID and Proj_ID and non-key
attributes, i.e. Stu_Name and Proj_Name must be dependent upon both. Stu_Name
can be identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not
allowed in the Second Normal Form.
Student(stu_id, stu_name, proj_id)
Project (proj_id, proj_name)
According to II-NF, break this the relation into two 1. Student and 2.
Project.
These relations remove the partial dependency.
o No Partial Dependency, i.e., no
non-prime attribute (attributes which are not part of any candidate key) is
dependent on any proper subset of any candidate key of the table.
o Partial Dependency – If the proper subset of
candidate key determines a non-prime attribute, it is called partial dependency
Example 1 – Consider
table-3 as following below.
STUD_NO |
COURSE_NO |
COURSE_FEE |
1 |
C1 |
1000 |
2 |
C2 |
1500 |
1 |
C4 |
2000 |
4 |
C3 |
1000 |
4 |
C1 |
1000 |
2 |
C5 |
2000 |
COURSE_FEE is a non-prime
attribute, not belong to the one only candidate key {STUD_NO, COURSE_NO}. But,
COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO, it is a
Non-prime attribute COURSE_FEE.
Example: TEACHER table
TEACHER_ID |
SUBJECT |
TEACHER_AGE |
25 |
Chemistry |
30 |
25 |
Biology |
30 |
47 |
English |
35 |
83 |
Math |
38 |
83 |
Computer |
38 |
TEACHER_AGE is the non-prime attribute, dependent
on TEACHER_ID .
To convert the given table into 2NF,
decompose into two tables:
TEACHER_DETAIL table:
TEACHER_ID |
TEACHER_AGE |
25 |
30 |
47 |
35 |
83 |
38 |
TEACHER_SUBJECT table:
TEACHER_ID |
SUBJECT |
25 |
Chemistry |
25 |
Biology |
47 |
English |
83 |
Math |
83 |
Computer |
======================================================================
Third Normal Form (3NF)
- A relation is in 2NF and does not contain any
transitive partial dependency of non-prime attributes.
- 3NF reduces data duplication. It maintains data integrity.
- Relation has only single
candidate keys( i.e. every candidate key consists of only 1 attribute)
- Holds at least one of the following conditions
for every non-trivial function dependency X → Y.
- X is a super key.
- Y is a prime attribute, i.e., each element of
Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
EMP_ID |
EMP_NAME |
EMP_ZIP |
EMP_STATE |
EMP_CITY |
222 |
Harry |
201010 |
UP |
Noida |
333 |
Stephan |
02228 |
US |
Boston |
444 |
Lan |
60007 |
US |
Chicago |
555 |
Katharine |
06389 |
UK |
Norwich |
666 |
John |
462007 |
MP |
Bhopal |
Super key in the table above:
- {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes
except EMP_ID are non-prime.
EMP_STATE & EMP_CITY dependent on EMP_ZIP .
EMP_ZIP dependent on EMP_ID.
The non-prime attributes (EMP_STATE, EMP_CITY) are transitively dependent on the super key(EMP_ID). It violates the rule of the third normal form, so split this
relation into two sub relations EMPLOYEE table, EMPLOYEE_ZIP table.
EMPLOYEE table:
EMP_ID |
EMP_NAME |
EMP_ZIP |
222 |
Harry |
201010 |
333 |
Stephan |
02228 |
444 |
Lan |
60007 |
555 |
Katharine |
06389 |
666 |
John |
462007 |
EMPLOYEE_ZIP table:
EMP_ZIP |
EMP_STATE |
EMP_CITY |
201010 |
UP |
Noida |
02228 |
US |
Boston |
60007 |
US |
Chicago |
06389 |
UK |
Norwich |
462007 |
MP |
Bhopal |
======================================================================
Boyce Codd normal form (BCNF)
- The advance version of 3NF.
- It is stricter than 3NF.
- Every functional dependency X → Y, X is the
super key of the table.
Example: EMPLOYEE table:
EMP_ID |
EMP_COUNTRY |
EMP_DEPT |
DEPT_TYPE |
EMP_DEPT_NO |
264 |
India |
Designing |
D394 |
283 |
264 |
India |
Testing |
D394 |
300 |
364 |
UK |
Stores |
D283 |
232 |
364 |
UK |
Developing |
D283 |
549 |
In the above table Functional dependencies
are as follows:
- EMP_ID → EMP_COUNTRY
- EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
To convert the given table into BCNF, decompose
it into three tables: EMP_COUNTRY, EMP_DEPT, EMP_DEPT_MAPPING.
EMP_COUNTRY table:
EMP_ID |
EMP_COUNTRY |
264 |
India |
264 |
India |
EMP_DEPT table:
EMP_DEPT |
DEPT_TYPE |
EMP_DEPT_NO |
Designing |
D394 |
283 |
Testing |
D394 |
300 |
Stores |
D283 |
232 |
Developing |
D283 |
549 |
EMP_DEPT_MAPPING table:
EMP_ID |
EMP_DEPT |
D394 |
283 |
D394 |
300 |
D283 |
232 |
D283 |
549 |
Functional dependencies:
- EMP_ID → EMP_COUNTRY
- EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
1. first table: EMP_ID
2. second table: EMP_DEPT
3. third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because the left side part
of both the functional dependencies is a key.
1.
BCNF
is free from redundancy.
2.
If
a relation is in BCNF, then 3NF is also satisfied.
3.
If
all attributes of relation are prime attributes, then the relation is always in
3NF.
4.
Every
Binary Relation ( only 2 attributes ) is always in BCNF.
================================================================
Summary
Normal Form |
Description |
1NF |
A relation contains an atomic value. |
2NF |
In this relation, all non-key attributes are
fully functional and dependent on the primary key. (full functional dependency) |
3NF |
A relation is not in transition dependency.
(remove transition dependency) |
BCNF |
In this relation, all attributes
of the relation are prime attribute. (no candidate key) |
===========================================================
0 Comments