• Arun Kumar

Database normalization

Database normalization is the process of refining the data in accordance with a series of normal forms. This is done to reduce data redundancy and improve data integrity. This process divides large tables into small tables and links them using relationships. 


The concept of normalization was invented by Edgar Codd and he introduced First Normal form before moving ahead with Second and Third Normal forms.


Normalization forms

  • 1st Normal Form

  • 2nd Normal Form

  • 3rd Normal Form

  • Boyce-Codd Normal Form

  • 4th Normal Form

  • 5th Normal Form

  • 6th Normal Form

There are further enhancements to theory of normalization and it is still being developed. There is even 6th normal form but in any practical scenario, normalization achieves its best shape in 3rd Normal form.


Key terms

  • Column – Attribute

  • Row – Tuple

  • Table – Relation

  • Entity – Any real world object that makes sense


Normalization example


Let us look at a library table that maintains all the books they rent out in one single table:

Now let us push this data from various normal forms and see how we can refine the data.


1NF (First Normal Form)


The rules of the first normal form are:

  • Each table cell should contain a single/atomic value

  • Every record in the table must be unique

Let us first convert the Books_Main_Table into 1NF

As per the 1NF rules, our Books Main Table looks good. Before we proceed with 2NF and 3NF, we need to understand key columns.


Key / non-key columns


Any column (or group of columns) in a table which can uniquely identify each row is known as key column. For example:

  • Phone number

  • Email id

  • Student roll number

  • Employee id

These are some columns which will always remain unique to every record inside the table.


Such columns are known as key columns inside the table. Any column apart from key columns is known as non-key column.


Primary key


A primary key is a single column value which uniquely identifies each record in a table. In RDBMS, primary key must satisfy below:

  • Primary key must be unique

  • Primary key cannot be null

  • Every record will have primary key value


Composite Key


Sometimes its hard to define unique records with one single column. In such cases, we can have two or more columns that uniquely identify each record in a table. Such columns are known as composite key. For example:

  • Name + Address

  • First Name + DOB + Father Name

Now that we know about key / non-key columns, let us move to 2NF.


2NF (Second Normal Form)


The rules of the second normal form are:

  • Table must be in 1NF

  • Every non-key attribute must be fully dependent on key attributes

We see that our Books_Main_Table does not have any primary key, in such cases, we will have to introduce a new key column like Membership ID.


To make Books_Main_Table into 2NF, we need to see how columns are closely related:

  • Membership ID has a salutation, name and address

  • Membership ID has books issued on their name

With this logic in mind, we will have to divide our Books_Main_Table into two table:

If you see the above tables, we have Membership ID in both tables but in Membership_Details_table, it is a primary key column and in Books_Issued_table, it is a non-key column.


Foreign Key


Till now we have seen Primary key and composite key. A foreign key refers to a primary key column of another table. This helps in connecting two tables (and defines a relation between two tables). A foreign key must satisfy below:

  • Foreign key column name can be different than primary key column name

  • Unlike primary key, then need not be unique (see Books_Issued_Table above)

  • Foreign key column can be null even though primary key column cannot


Reason for Foreign key


When a user tries to insert a record into Books_Issued_Table and if there is no membership ID exists in Membership_Details_Table, it will be rejected. This way, we maintain data integrity in RDBMS.


If there is no record with Membership ID in the parent table, it will be rejected and database will throw an error.


3NF (Third Normal Form)


The rules of the third normal form are:

  • Data must be in 2NF

  • No transitive functional dependencies


What is transitive dependency?


In simple terms, if changing a non-key column causes any other non-key column to change, then its called as transitive dependency.


In our example, if we change Full Name of the customer, it might change Salutation


Final 3NF Tables


To move the Membership_Details_Table into 3NF, we need to further divide the table into below:

We have divided the Membership_Details_Table into a new Salutation_table.


Assignment


If you see the Books_Issued_Table, it still does not have a key column. What do you think should be the key column for the Books_Issued_Table? Or do we need to introduce a new column?


Recent Posts

See All

How to plan database creation

Database creation process involves several operating system files to work together. You create database only once and later on keep on adding more data files. Before you can simply go ahead and start

Oracle Transportable Tablespace

Assume that you have a schema (IQS) on production server and you want to import the schema into test server. The ideal approach would be to perform entire schema export and import using data pump. Rig

Grant Select on all tables in a schema

I encountered this situation where I wanted to grant SELECT on all the tables owned by one user to another user. There are two simple ways to achieve this: Generate SQL for each table In this method,

  • Facebook
  • Twitter
  • YouTube
  • Instagram

© 2020 by DBA Genesis

Feedback