Indexes

Clustered vs Secondary Indexes

Clustered Index

In InnoDB, every table has a clustered index that defines the physical order of rows on disk.

  1. If there’s a primary key, it is the clustered index.
  2. If there’s no primary key, InnoDB chooses a unique, non-null index as the clustered index.
  3. If neither exists, InnoDB creates an implicit auto-increment column as the clustered index.

B+ Tree

By default, InnoDB stores both clustered and secondary indexes as B+ Trees.

Alt text Only leaf nodes hold data; leaves are sorted by primary key and adjacent leaves are linked by a doubly linked list. Internal nodes hold keys and child pointers.

Indexes and data reside on disk; accessing any node is a disk I/O. Even at tens of millions of rows, a B+ Tree typically has 3–4 levels, i.e., 3–4 I/Os per point lookup.

Advantages over B-Trees:

  1. Internal nodes can store more keys/pointers, reducing height and I/O.
  2. Range queries only need to traverse leaf chains, not the entire tree.
  3. Efficient sequential access.

Advantages over binary trees: A node can have up to m children (hundreds to thousands). Time complexity is O(log_m N).

Secondary Index

Clustered index leaf nodes store actual rows; secondary index leaves store primary keys.

If you query by a secondary index and need more than the primary key, you need a back-to-table lookup (two B+ Tree traversals). If only the primary key is needed, no back lookup is required.

Primary, Unique, and Prefix Indexes

1
CREATE UNIQUE INDEX xxx ON xxxtable(col1, col2)

A table can have multiple unique indexes, each on UNIQUE columns.

Normal index: no PRIMARY or UNIQUE requirement.

Prefix index: build on the first N characters of a string column (char, varchar, binary, varbinary) to reduce index size.

binary stores byte strings; can store images, videos, etc.

Single-Column vs Composite Indexes

Composite (multi-column) indexes are created on multiple columns. Put high-cardinality columns first.

Leftmost Prefix Rule

For a composite index (col1, col2, col3), the B+ Tree is sorted by col1 first, then col2, then col3.

Usable conditions:

  • col1 = ?
  • col2 = ? AND col1 = ? (optimizer can reorder)
  • col1 = ? AND col2 = ? AND col3 = ?

Not usable:

  • col2 = ? (doesn’t start from col1)
  • col3 = ? (doesn’t start from col1)

Can a composite index support range queries? Yes, but columns after a range condition cannot be used. For (col1, col2, col3) and WHERE col1 = 1 AND col2 > 2 AND col3 = 3, col3 cannot use the index because the order on col3 is undefined within col2 > 2.

How to tell if a composite index is used?

1
EXPLAIN SELECT * FROM users WHERE age > 25 and salary=50000;

If key_len shows 4 (one int), salary likely isn’t used by the index.

What about WHERE age >= 25 and salary = 50000? Both can be used. For age = 25, salary remains ordered, so the index can still narrow the scan.

What about WHERE age BETWEEN 20 AND 30 and salary=50000? BETWEEN means 20 <= age <= 30, so both columns can participate.

What about WHERE name like ‘j%’ and salary=50000 for index (name, salary)? Both can be used. name like ‘j%’ selects a prefix range; within name = ‘j’, salary remains ordered.

Index Condition Pushdown (ICP)

1
2
SELECT * FROM employees 
WHERE department_id = 10 AND hire_date BETWEEN '2024-01-01' AND '2024-12-31';

Without ICP: scan all department_id=10 entries via the index, then back to table for full rows and filter hire_date.

With ICP: during index scan, apply the hire_date BETWEEN condition so mismatching rows are rejected before back-to-table, reducing I/O and memory.

Index Selectivity

For composite indexes, put the most selective (highest cardinality) column first. Selectivity = #distinct / total rows.

When Indexes Fail

  1. Left/leading/bi-directional wildcard LIKE patterns. Only right-anchored patterns can leverage ordered lookups.
  2. Functions on indexed columns: e.g., where length(name)=6. You can index the expression result instead (functional index in newer MySQL/MariaDB).
  3. Expressions on indexed columns: e.g., where age+1=30 won’t use index; where age=30-1 will.
  4. Implicit type conversion. MySQL converts strings to numbers. If the column is INT and the predicate literal is VARCHAR, MySQL casts the literal (index still usable). If the column is VARCHAR and the predicate is INT, MySQL effectively casts the column (index not used).
  5. Composite index not following the leftmost rule.
  6. OR conditions where only one side is indexed.

Slow Queries

Symptoms: slow page loads or API latency > 1s.

  1. Use APM (e.g., SkyWalking) to find slow endpoints and SQLs.
  2. Use MySQL’s slow query log.
1
2
slow_query_log=1
long_query_time=2

EXPLAIN Clues

  1. select_type:
  • SIMPLE: simple SELECT without subqueries/UNION
  1. type:
  • system: table has only one row
  • const: using primary/unique index to fetch a single row
  • eq_ref: join using primary/unique key
  • ref: equality on a non-unique index
  1. extra:
  • using index: covering index
  • using index condition: ICP (engine-level filtering)
  • using where: needs back-to-table
  • using temporary: extra temporary table (e.g., GROUP BY/ORDER BY on non-indexed cols)
  • using filesort: extra sorting (ORDER BY not covered by same index as WHERE, or sorting on non-indexed col, or non-driving table)

Connections Too Low

Increase server max_connections (default 100) and application pool size.

Buffer Pool Too Small

Check Buffer Pool hit ratio; typically above 99%.

Tablespaces

.frm stores table schema; .ibd stores table data. Alt text

Page Splits

Non-auto-increment primary keys can cause page splits on random inserts.

Sharding