What is a Database Index?
A database index is a data structure that improves the speed of data retrieval operations. Think of it like a book's index—instead of reading every page to find a topic, you look it up in the index and go directly to the right page.
Without an index, the database must scan every row in a table to find matching records (a "full table scan"). With proper indexing, queries can find data in logarithmic time rather than linear time.
How Indexes Work
Most databases use B-tree (balanced tree) indexes. The index maintains a sorted structure that points to actual table rows:
Table: users
+----+-------+----------------------+
| id | name | email |
+----+-------+----------------------+
| 1 | Alice | alice@example.com |
| 2 | Bob | bob@example.com |
| 3 | Carol | carol@example.com |
| ...| ... | ... |
+----+-------+----------------------+
Index on email (B-tree):
[bob@...]
/ \
[alice@...] [carol@...]
Each node points to the row in the table.
When you query WHERE email = 'bob@example.com', the database traverses the tree (O(log n)) instead of scanning all rows (O(n)).
Types of Indexes
General-purpose index. Good for equality (=) and range queries (<, >, BETWEEN). The default in most databases.
Very fast for equality comparisons only. Can't use for range queries or sorting. Used in memory tables.
Specialized for text search. Supports word matching, relevance ranking, and natural language queries.
For geographic data. Optimizes queries like "find all locations within 10 miles."
Index on multiple columns. Order matters—an index on (a, b, c) can help queries on (a), (a, b), or (a, b, c), but not (b) or (c) alone.
Creating Indexes
Basic Index
-- Single column CREATE INDEX idx_users_email ON users(email); -- Composite index CREATE INDEX idx_orders_user_date ON orders(user_id, created_at); -- Unique index (enforces uniqueness) CREATE UNIQUE INDEX idx_users_email_unique ON users(email);
Partial Index (PostgreSQL)
-- Only index active users CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
Expression Index
-- Index on lowercase email for case-insensitive search CREATE INDEX idx_users_email_lower ON users(LOWER(email));
When to Create Indexes
Good Candidates for Indexing
- Primary keys: Usually auto-indexed
- Foreign keys: Used in JOINs frequently
- Columns in WHERE clauses: Especially with high selectivity
- Columns in ORDER BY: Can avoid sorting
- Columns in GROUP BY: Speeds up aggregations
Poor Candidates
- Small tables: Full scan might be faster
- Frequently updated columns: Index maintenance overhead
- Low cardinality columns: Few distinct values (like boolean or status)
- Columns rarely used in queries: Just wastes space
The Index Trade-off
Indexes aren't free. They have costs:
- Faster SELECT queries
- Faster JOINs
- Faster sorting
- Faster uniqueness checks
- Slower INSERT/UPDATE/DELETE
- Additional disk space
- Memory usage for caching
- Maintenance overhead
A table with many indexes will have fast reads but slower writes. Find the right balance for your workload.
Analyzing Query Performance
EXPLAIN
Use EXPLAIN to see how the database executes a query:
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com'; -- MySQL output example +----+-------------+-------+------+---------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | users | ref | idx_email | idx_email| 767 | const | 1 | | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------+
Key Fields to Check
- type: ALL (full scan) is bad; ref, eq_ref, const are good
- key: Which index is being used (NULL means no index)
- rows: Estimated rows to examine (lower is better)
- Extra: "Using index" is good; "Using filesort" may indicate issues
Composite Index Ordering
Column order in composite indexes matters. The index can be used for queries that filter on:
- The first column
- The first two columns
- All columns
CREATE INDEX idx_orders ON orders(user_id, status, created_at); -- Can use index âś“ WHERE user_id = 123 WHERE user_id = 123 AND status = 'pending' WHERE user_id = 123 AND status = 'pending' AND created_at > '2024-01-01' -- Cannot use index effectively âś— WHERE status = 'pending' -- skips user_id WHERE created_at > '2024-01-01' -- skips user_id, status WHERE user_id = 123 AND created_at > '...' -- skips status (partial use)
Ordering Guidelines
- Equality conditions first (
user_id = ?) - Range conditions last (
created_at > ?) - Most selective columns first (if similar query patterns)
Covering Indexes
A covering index contains all columns needed by a query, so the database can return results directly from the index without accessing the table:
-- Query SELECT email, name FROM users WHERE email = 'john@example.com'; -- Covering index CREATE INDEX idx_users_email_name ON users(email, name); -- EXPLAIN shows "Using index" - no table access needed
Covering indexes can dramatically speed up queries but increase index size.
Common Indexing Mistakes
Slows down writes significantly. Only index what you query.
JOINs and cascading deletes become slow.
WHERE YEAR(created_at) = 2024 can't use an index on created_at. Use WHERE created_at >= '2024-01-01'.
Index (a, b, c) doesn't help queries on (b) or (c) alone.
Having both (user_id) and (user_id, status)—the second covers the first.
Best Practices
- Always index foreign keys
- Use EXPLAIN to verify index usage
- Monitor slow query logs
- Consider your read/write ratio when adding indexes
- Remove unused indexes (they still cost write performance)
- Keep composite indexes as small as possible
- Use partial indexes when filtering on specific values frequently
- Test index changes on production-like data volumes
Format Your SQL
Use our SQL formatter to beautify your database queries and schema definitions.
Open SQL Formatter →