Stage 1. Relational Databases and the Relational Model
A relational database is a database that stores structured data in tables and lets you connect records that belong together. The model is called relational because data is not stored as one huge object or one long file. Instead, related facts are split into tables, and relationships between rows are expressed with keys.
The relational model is a way to organize data so the database stays clear and reliable as the project grows. A table has columns, which describe what fields are stored, and rows, which contain actual records. The main value of the model is that you define structure, relationships, and constraints upfront, and the database helps prevent data chaos.

Core terms without overload
| Term | Practical meaning | Example |
|---|---|---|
| Table | Collection of similar entities | users, orders |
| Column | One typed data field | email, created_at |
| Row | One concrete record | One order |
| Primary key | Unique row identifier | orders.id |
| Foreign key | Reference to row in another table | orders.user_id -> users.id |
| RDBMS | Software that manages a relational database | PostgreSQL, MySQL, Oracle Database |
| Constraint | Rule enforced by the database | NOT NULL, UNIQUE |
Key idea: one table should represent one entity type. User data belongs in users, order data in orders, payment data in payments. If user data is duplicated in each order row, inconsistencies will appear.
An RDBMS, or relational database management system, is the software layer that stores tables, checks rules, executes SQL queries, controls access, and keeps data durable. When developers say “we use PostgreSQL” or “the application writes to MySQL,” they usually mean the application talks to an RDBMS, and the RDBMS manages the actual relational database files and operations.
Why tables are split instead of duplicated
Imagine a small shop with customers and orders. A simple but weak design would store the customer's name, phone, address, and every order detail in one big orders table. It looks convenient at first, but it breaks down quickly. If the customer changes an address, you must update many rows. If one row is missed, the database now contains conflicting versions of the same customer.
The relational approach stores customer data once in users and order data in orders. The orders.user_id column points to the correct row in users. This shared key lets the application find the customer's shipping data when it processes an order without copying that data into every order row. This is the foundation for later topics such as joins, normalization, and foreign key constraints.
Simple starter schema
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL
);
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
amount NUMERIC(10,2) NOT NULL,
status TEXT NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
What this schema guarantees:
PRIMARY KEYensures row uniqueness;FOREIGN KEYblocks orders without an existing user;NOT NULLprotects required fields;UNIQUEprevents duplicate email values.
These guarantees are examples of data integrity. Integrity means the database is not just storing bytes; it is enforcing rules that keep the data meaningful. A table can reject missing required values, duplicate identifiers, references to nonexistent rows, or values with the wrong type. This matters because application code is not the only writer in many real systems. Imports, admin tools, background jobs, and several services may all write to the same database. Database-level rules protect the data even when one caller has a bug.
Logical model and physical storage
A relational database separates the logical structure from the physical storage. The logical structure is what developers work with: tables, columns, views, indexes, keys, and constraints. The physical storage is how the RDBMS places data on disk or in memory, splits files, caches pages, and finds rows efficiently.
This separation is important. A developer can write SELECT * FROM orders without knowing which data file contains the rows or how many disk pages the RDBMS reads. A database administrator can move files, tune storage, add indexes, or change internal layout without renaming the orders table in the application. SQL describes what data is needed; the RDBMS decides how to access it.
Query walkthrough (line by line)
SELECT id, user_id, amount, status, created_at
FROM orders
WHERE status = 'PAID'
ORDER BY created_at DESC
LIMIT 5;
Line-by-line meaning:
SELECT id, user_id, amount, status, created_at: Choose exactly which columns to return.FROM orders: Read data from theorderstable.WHERE status = 'PAID': Keep only paid orders.ORDER BY created_at DESC: Sort from newest to oldest.LIMIT 5: Return only 5 rows after filtering and sorting.
Full query meaning: “Show 5 latest paid orders.”
SQL, transactions, and consistency
SQL is the standard language used to define, read, and change data in relational databases. CREATE TABLE defines structure, INSERT adds rows, SELECT reads rows, and later lessons will introduce UPDATE, DELETE, joins, grouping, indexes, and transactions. SQL is declarative: you normally describe the result you want, and the RDBMS builds an execution plan for getting it.
Relational databases are often chosen when correctness is critical. For example, if a bank account deposit is saved, the updated balance must be visible consistently. If an online shop reserves several items that must be shipped together, either all required inventory changes should be committed, or none of them should be committed. This “all or nothing” behavior is called atomicity.
Atomicity is part of ACID, a set of transaction properties that relational databases are known for:
| Property | Meaning in practice |
|---|---|
| Atomicity | A transaction is completed fully or not applied at all |
| Consistency | Data must remain valid according to database rules |
| Isolation | Uncommitted work should not confuse other concurrent operations |
| Durability | Committed changes must survive failures such as a restart |
You do not need to master transactions in this first lesson, but you should understand why the relational model is not only about tables. It also gives the database a rules-based way to keep data accurate while many users and programs work with it.
Concurrent access and locks
Real databases are shared. Several users, API requests, background workers, and reports can read or change data at the same time. Concurrency is the database's ability to handle that simultaneous activity while still protecting correctness.
When two operations try to change the same row, the RDBMS may use locks or versioning techniques so the final state stays valid. A lock temporarily protects data that is being changed. Good relational databases try to lock only what is necessary, often a row rather than a whole table, because broad locks reduce performance. You will study isolation levels later, but the basic idea is simple: the database must balance speed with safe access to shared data.
What should be clear before the next topic
The relational model is not abstract theory. It is a practical framework for storing connected facts with clear rules. If tables are split by entity, keys are defined, relationships are correct, and constraints protect important fields, SQL queries become predictable and easier to maintain.
When choosing or using an RDBMS, teams usually care about data accuracy, scalability, concurrency, performance, reliability, and operational support. Different products have different strengths, but the core relational ideas stay the same: tables represent structured facts, keys connect those facts, constraints protect them, and SQL gives applications a common way to work with them.
Practice Before the Next Lesson
Copy and run these commands step by step.
Step 1. Insert users:
INSERT INTO users(name, email) VALUES
('Anna', '[email protected]'),
('Ivan', '[email protected]'),
('Olga', '[email protected]');
Step 2. Insert orders:
INSERT INTO orders(user_id, amount, status, created_at) VALUES
(1, 1200.00, 'PAID', '2026-05-10 12:00:00'),
(1, 300.00, 'NEW', '2026-05-10 11:00:00'),
(2, 700.00, 'PAID', '2026-05-10 10:00:00'),
(3, 450.00, 'CANCELLED', '2026-05-10 09:00:00'),
(2, 900.00, 'PAID', '2026-05-10 08:00:00'),
(1, 150.00, 'PAID', '2026-05-10 07:00:00'),
(3, 510.00, 'NEW', '2026-05-10 06:00:00'),
(2, 810.00, 'PAID', '2026-05-10 05:00:00');
Step 3. Run query:
SELECT id, user_id, amount, status, created_at
FROM orders
WHERE status = 'PAID'
ORDER BY created_at DESC
LIMIT 5;
Step 4. Validate result:
- only
PAIDrows remain; - first row has the newest
created_at; - row count is at most 5.
If output differs, the issue is in WHERE, ORDER BY, or LIMIT.