LibraryNormalization Forms

Normalization Forms

Learn about Normalization Forms as part of PostgreSQL Database Design and Optimization

Understanding Normalization Forms in PostgreSQL

Normalization is a database design technique used to organize data in a database. Its primary goals are to eliminate redundant data and improve data integrity. This process involves dividing larger tables into smaller, more manageable tables and defining relationships between them. We will explore the most common normalization forms: First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF), and how they apply to PostgreSQL.

First Normal Form (1NF)

The first normal form (1NF) is the most basic level of normalization. It ensures that each column in a table contains atomic (indivisible) values and that there are no repeating groups of columns. This means each cell should contain only one piece of data, and each record should be unique.

1NF: Each column has atomic values, no repeating groups.

To achieve 1NF, ensure each attribute (column) holds a single value and that there are no duplicate sets of columns within a table. This prevents multi-valued attributes.

A table is in 1NF if it meets the following conditions:

  1. Atomicity: Each attribute (column) contains only single, atomic values. This means a single cell should not contain multiple values (e.g., a comma-separated list of items).
  2. No Repeating Groups: There are no repeating groups of columns. For instance, having columns like phone1, phone2, phone3 for multiple phone numbers is not in 1NF. Instead, a separate table for phone numbers would be more appropriate.
  3. Unique Records: Each record (row) in the table must be uniquely identifiable, typically through a primary key.
What are the two main conditions for a table to be in First Normal Form (1NF)?
  1. Each column contains atomic values. 2. There are no repeating groups of columns.

Second Normal Form (2NF)

Second Normal Form (2NF) builds upon 1NF. It requires that all non-key attributes (columns that are not part of the primary key) must be fully functionally dependent on the entire primary key. This is particularly relevant for tables with composite primary keys (keys made up of two or more columns).

2NF: Non-key attributes depend on the *entire* primary key.

If a table has a composite primary key, 2NF ensures that no non-key attribute depends on only part of that composite key. This avoids data redundancy by moving partial dependencies into separate tables.

A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the primary key. This means:

  • If the primary key is a single column, the table is automatically in 2NF (as there are no partial dependencies possible).
  • If the primary key is composite (e.g., (OrderID, ProductID)), then any non-key attribute that depends only on OrderID (but not ProductID) should be moved to a separate table. For example, if OrderDate depends only on OrderID, it should not be in the OrderDetails table if (OrderID, ProductID) is the primary key.
When is a table in 1NF NOT automatically in 2NF?

When the table has a composite primary key, and a non-key attribute depends on only a part of that composite key.

Third Normal Form (3NF)

Third Normal Form (3NF) further refines the database structure by eliminating transitive dependencies. A transitive dependency occurs when a non-key attribute depends on another non-key attribute, which in turn depends on the primary key.

3NF: Eliminate transitive dependencies (non-key attributes depending on other non-key attributes).

To achieve 3NF, if a non-key attribute depends on another non-key attribute, that relationship should be moved to a separate table. This ensures that all non-key attributes are directly dependent on the primary key.

A table is in 3NF if it is in 2NF and there are no transitive dependencies. A transitive dependency exists when:

  1. Attribute A determines Attribute B (A -> B).
  2. Attribute B determines Attribute C (B -> C).
  3. Attribute A is the primary key.
  4. Attribute B is not part of any candidate key.
  5. Attribute C is not a prime attribute (not part of any candidate key).

In simpler terms, if a non-key column can be determined from another non-key column, that relationship should be moved to a new table. For example, if EmployeeID determines DepartmentID, and DepartmentID determines DepartmentName, then DepartmentName has a transitive dependency on EmployeeID through DepartmentID. To achieve 3NF, DepartmentID and DepartmentName should be in a separate Departments table.

What is a transitive dependency in the context of 3NF?

A transitive dependency occurs when a non-key attribute depends on another non-key attribute, which in turn depends on the primary key.

Benefits of Normalization in PostgreSQL

Applying normalization forms like 1NF, 2NF, and 3NF in PostgreSQL offers significant advantages:

  • Reduced Data Redundancy: Minimizes storage space and prevents inconsistencies.
  • Improved Data Integrity: Ensures data accuracy and consistency by reducing anomalies like insertion, update, and deletion anomalies.
  • Easier Maintenance: Simplifies database modifications and updates.
  • More Flexible Design: Allows for easier expansion and modification of the database schema.

While higher normal forms (like BCNF, 4NF, 5NF) exist, 3NF is often considered the sweet spot for most relational databases, balancing data integrity with query performance.

Example: Normalizing an Order Table

Consider an initial table storing order information:

code
Orders
table (Unnormalized): | OrderID | CustomerID | CustomerName | CustomerAddress | OrderDate | ProductID | ProductName | ProductPrice | Quantity | |---|---|---|---|---|---|---|---|---| | 101 | C1 | Alice Smith | 123 Main St | 2023-10-26 | P1 | Laptop | 1200 | 1 | | 101 | C1 | Alice Smith | 123 Main St | 2023-10-26 | P2 | Mouse | 25 | 2 | | 102 | C2 | Bob Johnson | 456 Oak Ave | 2023-10-26 | P1 | Laptop | 1200 | 1 |

The unnormalized table above suffers from redundancy. Customer information (Name, Address) is repeated for every order they place. Product information (Name, Price) is repeated for every order that includes that product. This violates 1NF (if we consider multiple products per order as a repeating group) and 2NF/3NF due to partial and transitive dependencies.

To normalize this, we can create separate tables:

  1. Customers Table: | CustomerID (PK) | CustomerName | CustomerAddress | |---|---|---| | C1 | Alice Smith | 123 Main St | | C2 | Bob Johnson | 456 Oak Ave |

  2. Products Table: | ProductID (PK) | ProductName | ProductPrice | |---|---|---| | P1 | Laptop | 1200 | | P2 | Mouse | 25 |

  3. Orders Table: | OrderID (PK) | CustomerID (FK) | OrderDate | |---|---|---| | 101 | C1 | 2023-10-26 | | 102 | C2 | 2023-10-26 |

  4. OrderDetails Table: (Links Orders and Products) | OrderDetailID (PK) | OrderID (FK) | ProductID (FK) | Quantity | |---|---|---|---| | 1 | 101 | P1 | 1 | | 2 | 101 | P2 | 2 | | 3 | 102 | P1 | 1 |

This normalized structure is in 3NF, significantly reducing redundancy and improving data integrity.

📚

Text-based content

Library pages focus on text content

In the normalized example, which table holds the relationship between orders and products, and what columns does it contain?

The OrderDetails table holds the relationship, containing OrderID (FK), ProductID (FK), and Quantity.

Learning Resources

Database Normalization Explained(tutorial)

A comprehensive tutorial covering the basics of database normalization, including 1NF, 2NF, and 3NF, with practical examples relevant to PostgreSQL.

SQL Normalization - GeeksforGeeks(blog)

This article provides a clear explanation of normalization concepts and the different normal forms, making it accessible for beginners.

Understanding Database Normalization(tutorial)

A detailed guide that breaks down normalization forms with illustrative examples, helping to solidify understanding of the principles.

Database Normalization - Wikipedia(wikipedia)

The Wikipedia page offers a broad overview of database normalization, its history, and the various normal forms, including BCNF and beyond.

Normalization Forms in DBMS(tutorial)

This resource explains normalization forms in the context of Database Management Systems (DBMS), providing definitions and examples for each form.

SQL Normalization: A Step-by-Step Guide(blog)

A practical, step-by-step approach to understanding and applying normalization techniques in SQL database design.

Normalization in Database Design(tutorial)

TutorialsPoint offers a concise explanation of normalization, focusing on the benefits and the process of achieving different normal forms.

The Importance of Database Normalization(blog)

This blog post discusses the practical importance and benefits of proper database normalization for application development and maintenance.

Database Normalization: 1NF, 2NF, 3NF, BCNF(video)

A video tutorial that visually explains the concepts of 1NF, 2NF, 3NF, and BCNF with clear examples, aiding comprehension.

PostgreSQL Documentation - Data Types(documentation)

Understanding PostgreSQL's data types is crucial for ensuring atomicity, a key principle of 1NF. This official documentation provides details on available types.