Database Design: Normalization 1NF, 2NF, 3NF & BCNF
This article is a definition of terms for normalization in databases – including practical examples and exam questions.
In a Nutshell
Normalization is the process of structuring relational databases to eliminate redundancy and avoid data anomalies. The goal is to ensure data integrity and consistency.
Compact Technical Description
Normalization is a formal approach to eliminating redundancy and avoiding anomalies in relational databases. By stepwise application of normal forms, the data structure is improved.
Anomalies without normalization:
- Insert anomalies: Data cannot be inserted without other information
- Update anomalies: Changes require updates at multiple locations
- Delete anomalies: Deleting data unintentionally removes other information
Normal forms:
- 1NF: Atomic values, no repeating groups
- 2NF: 1NF + no partial dependencies
- 3NF: 2NF + no transitive dependencies
- BCNF: Stronger form of 3NF with stricter rules
Functional dependencies form the mathematical foundation: X → Y means that the value of X uniquely determines the value of Y.
Exam-Relevant Key Points
- 1NF: Atomic values, no repeating groups, unique primary keys
- 2NF: 1NF satisfied, no partial dependencies, complete dependency on primary key
- 3NF: 2NF satisfied, no transitive dependencies, non-key attributes depend only on primary key
- BCNF: Every determinant is a candidate key
- Functional dependencies: Mathematical foundation of normalization
- Anomalies: Insert, update, delete problems with denormalized data
- IHK-relevant: Important for data modeling and database design
- Practice: Trade-off between normalization and performance
Core Components
- Functional dependency: X → Y (X uniquely determines Y)
- Primary key: Unique identification of records
- Candidate key: Possible primary keys
- Partial dependency: Dependency on part of composite key
- Transitive dependency: Indirect dependency through non-key attributes
- Determinant: Attribute that determines other attributes
- Normalization process: Stepwise application of normal forms
- Denormalization: Intentional redundancy for performance optimization
Practical Examples
Unnormalized Table (0NF)
-- Problematic structure with redundancy and anomalies
CREATE TABLE Orders (
order_nr INT,
orderDate DATE,
customerName VARCHAR(100),
customerAddress VARCHAR(200),
article_nr INT,
articleName VARCHAR(100),
price DECIMAL(10,2),
quantity INT,
totalPrice DECIMAL(10,2)
);
-- Data with problems
INSERT INTO Orders VALUES
(1, '2024-01-15', 'Meier', 'Hauptstraße 1', 101, 'Laptop', 999.99, 2, 1999.98),
(1, '2024-01-15', 'Meier', 'Hauptstraße 1', 102, 'Maus', 29.99, 1, 29.99),
(2, '2024-01-16', 'Schmidt', 'Nebenstraße 2', 101, 'Laptop', 999.99, 1, 999.99);
First Normal Form (1NF)
-- Atomic values, no repeating groups
CREATE TABLE Orders_1NF (
order_nr INT,
orderDate DATE,
customerName VARCHAR(100),
customerAddress VARCHAR(200),
article_nr INT,
articleName VARCHAR(100),
price DECIMAL(10,2),
quantity INT,
totalPrice DECIMAL(10,2),
PRIMARY KEY (order_nr, article_nr)
);
-- Functional dependencies:
-- order_nr, article_nr → quantity, totalPrice
-- order_nr → orderDate, customerName, customerAddress
-- article_nr → articleName, price
Second Normal Form (2NF)
-- Elimination of partial dependencies
CREATE TABLE Orders_2NF (
order_nr INT PRIMARY KEY,
orderDate DATE,
customerName VARCHAR(100),
customerAddress VARCHAR(200)
);
CREATE TABLE OrderPositions (
order_nr INT,
article_nr INT,
quantity INT,
totalPrice DECIMAL(10,2),
PRIMARY KEY (order_nr, article_nr),
FOREIGN KEY (order_nr) REFERENCES Orders_2NF(order_nr)
);
CREATE TABLE Articles (
article_nr INT PRIMARY KEY,
articleName VARCHAR(100),
price DECIMAL(10,2)
);
Third Normal Form (3NF)
-- Elimination of transitive dependencies
CREATE TABLE Orders_3NF (
order_nr INT PRIMARY KEY,
orderDate DATE,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
customerName VARCHAR(100),
customerAddress VARCHAR(200)
);
CREATE TABLE OrderPositions (
order_nr INT,
article_nr INT,
quantity INT,
PRIMARY KEY (order_nr, article_nr),
FOREIGN KEY (order_nr) REFERENCES Orders_3NF(order_nr),
FOREIGN KEY (article_nr) REFERENCES Articles(article_nr)
);
CREATE TABLE Articles (
article_nr INT PRIMARY KEY,
articleName VARCHAR(100),
price DECIMAL(10,2)
);
Anomalies and Their Solutions
Insert Anomaly (without normalization)
-- Problem: Customer cannot be created without order
-- Solution: Separate customer table
INSERT INTO Customers (customer_id, customerName, customerAddress)
VALUES (3, 'Mueller', 'Dritter Weg 3');
Update Anomaly (without normalization)
-- Problem: Customer address must be changed at multiple locations
-- Solution: Central customer table
UPDATE Customers
SET customerAddress = 'Hauptstraße 1a'
WHERE customer_id = 1;
Delete Anomaly (without normalization)
-- Problem: Deleting last order removes customer data
-- Solution: Separate tables avoid unintended deletion
DELETE FROM OrderPositions WHERE order_nr = 1;
-- Customer data remains in customer table
BCNF (Boyce-Codd Normal Form)
BCNF Rule
Every determinant must be a candidate key.
-- Example that satisfies 3NF but not BCNF
CREATE TABLE ProjectEmployees (
project_id INT,
employee_id INT,
role VARCHAR(50),
PRIMARY KEY (project_id, employee_id)
);
-- Functional dependencies:
-- project_id, employee_id → role
-- project_id, role → employee_id (Determinant is not a candidate key!)
-- BCNF solution: Decomposition
CREATE TABLE ProjectRoles (
project_id INT,
role VARCHAR(50),
employee_id INT,
PRIMARY KEY (project_id, role)
);
CREATE TABLE EmployeeProjects (
project_id INT,
employee_id INT,
PRIMARY KEY (project_id, employee_id)
);
Advantages and Disadvantages
Advantages of Normalization
- Data integrity: Avoiding redundancy and inconsistencies
- Maintainability: Changes required at only one location
- Storage efficiency: Reduction of redundancy
- Consistency: Unified data representation
Disadvantages
- Performance: More joins required
- Complexity: More complex data structure
- Write performance: Distributed updates across multiple tables
- Learning curve: Requires understanding of functional dependencies
Common Exam Questions
-
What is the difference between 2NF and 3NF? 2NF eliminates partial dependencies, 3NF eliminates transitive dependencies.
-
When is a table in BCNF? When every determinant is a candidate key (stricter than 3NF).
-
Explain functional dependencies! X → Y means that the value of X uniquely determines the value of Y.
-
What are anomalies and how are they avoided? Insert, update, delete anomalies are avoided through normalization.
Most Important Sources
- https://de.wikipedia.org/wiki/Normalisierung_(Datenbank)
- https://www.gatech.edu/coe/cse/normalization
- https://www.sql-tutorial.ru/sql-normalization.html
Recommended Literature: Databases
Keine Bücher für Kategorie "datenbanken" gefunden.