Indexes
Clustered Index, Secondary Index
Clustered Index
In InnoDB, each table has a clustered index that determines the physical order of data storage on disk.
- If there is a primary key, it serves as the clustered index
- If there is no primary key, choose a unique non-null index as the clustered index
- If there still isn’t one, InnoDB implicitly creates an auto-increment column as the clustered index
B+ Tree
By default, InnoDB uses B+ Tree storage for both clustered and secondary indexes.
B+ trees only store data in leaf nodes, with data in each leaf node sorted by primary key order, and adjacent leaf nodes connected by bidirectional pointers. Non-leaf nodes store indexes and pointers to child nodes.
Both indexes and data are stored on hard disk, so accessing each node is one disk I/O operation. B+ trees storing tens of millions of data only need 3-4 levels, which means 3-4 disk I/O operations.
Advantages of B+ trees over B trees:
- Non-leaf nodes of B+ trees can store more key values and pointers, reducing tree height and disk I/O operations.
- For range queries, B trees need to traverse the entire tree; B+ trees only need to traverse the linked list of leaf nodes.
- Sequential access is convenient.
Advantages of B+ trees over binary trees: In B+ trees, a node can have at most m children, where m is usually hundreds to thousands. Search time complexity is O(logN) with base m.
Secondary Index
Leaf nodes of clustered indexes store actual data, while leaf nodes of secondary indexes store primary key values.
If querying based on secondary index and the required data is more than just the primary key, then table lookup is needed, meaning querying the B+ tree twice. If only the primary key is queried, table lookup is not needed.
Primary Key Index, Unique Index, Prefix Index
| |
A table can have multiple unique indexes, which must be built on UNIQUE fields.
Regular Index: Does not require fields to be PRIMARY or UNIQUE
Prefix Index: Creates an index on the first few characters of character type fields, aimed at reducing index space usage. Character types: char, varchar, binary, varbinary.
binary type is used to store byte strings and can store binary data like images and videos.
Single Column Index, Composite Index
Composite index, also called compound index, refers to creating an index on multiple fields. Columns with more unique values are usually placed first.
Leftmost Matching
Leftmost Matching: Create a compound index on (col1, col2, col3). In B+ tree storage, it first sorts by col1, then by col2 if col1 is the same, and so on.
Usable query conditions:
col1 = ?
col2 = ? AND col1 = ? (optimizer will optimize)
col1 = ? AND col2 = ? AND col3 = ?
Unusable query conditions:
col2 = ? (doesn’t start from leftmost column col1)
col3 = ? (doesn’t start from leftmost column col1)
Can compound indexes be used for range queries?
Yes, but columns after the range query column cannot use the index. For example, with index (col1, col2, col3), query WHERE col1 = 1 AND col2 > 2 AND col3 = 3, col3 cannot use the index.
This is because in records found by col1 = 1 AND col2 > 2, col3 is unordered. For example, records col1=1, col2=3, col3=5 and col1=1, col2=4, col3=2.
How to determine if a compound index is effective?
| |
If key_len is 4, representing one int, it means the salary index is not effective
In SELECT * FROM users WHERE age >= 25 and salary=50000;, which fields use the compound index?
Both are used. Although in the age>25 part, salary is unordered. But when age=25, the found salary is ordered, so it can reduce scan range and the compound index is effective.
In SELECT * FROM users WHERE age BETWEEN 20 AND 30 and salary=50000;, which fields use the compound index?
In MySQL, BETWEEN means 20<=age<=30, so both fields use the compound index.
In SELECT * FROM users WHERE name like ‘j%’ and salary=50000;, which fields of compound index (name, salary) use the compound index?
Both are used. name like ‘j%’ matches all strings starting with j. Among all records where name starts with j, salary is ordered.
Index Condition Pushdown
| |
Without index condition pushdown, it first finds all employee records in department 10 based on department_id, then performs table lookup to get complete row data, then filters records with matching hire_date.
With index condition pushdown, during the index dept_hire_date_idx scan, it directly applies the condition hire_date BETWEEN ‘2024-01-01’ AND ‘2024-12-31’. This means for each index item satisfying department_id = 10, if its hire_date is not in the specified range, no table lookup operation is performed, reducing unnecessary disk I/O and memory consumption.
Index Selectivity
When using compound indexes, put fields with high index selectivity first. Index selectivity = number of unique columns / total number of columns
Index Failure
- Left/left-right fuzzy matching: Because index B+ trees are sorted by index values, only right fuzzy matching can use indexes to locate a range.
- Using functions on indexes, like where length(name)=6. But you can create indexes on function calculation results.
- Performing expression calculations on indexes. For example, where age+1=30 doesn’t work, but where age=30-1 works. This is because InnoDB doesn’t create an index for age+1.
- Implicit type conversion. MySQL automatically converts strings to integers. If stored as int and where input is varchar, cast function is called on input value and index doesn’t fail; if stored as varchar and where input is int, it’s equivalent to calling function on index, resulting in full table scan.
- Compound index doesn’t follow leftmost matching.
- Where condition 1 Or condition 2, but only one field is in the index.
Slow Queries
Page loading too slow, interface stress test response time exceeds 1s
- SkyWalking can view response time for each interface and locate which specific SQL statement executes too slowly
- MySQL built-in slow query log
| |
Index Reasons
Use EXPLAIN command to understand the reasons.
- select_type (query type):
- SIMPLE: Simple SELECT query (no subqueries or UNION)
- type
- system: Table has only one record
- const: Primary key/unique index, finds exactly one record
- eq_ref: Multi-table join using primary key or unique index for association
- ref: Using regular index for equality query
- extra
- using index: Covering index
- using index condition: Index condition pushdown, filtering at engine layer.
- using where: Needs table lookup
- using temporary: Needs extra table. (GROUP BY and ORDER BY columns are different/GROUP BY uses unindexed columns)
- using filesort: Extra sorting operation. (ORDER BY uses unindexed columns/ORDER BY and WHERE use different indexes/sorting fields not in driving table during multi-table join queries)
Too Few Connections
Too few connections between client and server, causing some SQL statements to block.
MySQL server’s default maximum connections is 100, can be changed to 500;
Client applications have a connection pool, reusing connections from it each time. Just modify the ORM’s connection pool maximum count.
Buffer Pool Too Small
Check buffer pool cache hit rate, generally should be above 99%.
Tablespace
.frm files store table structure definitions;
.ibd files store table data

Page Split
Why is page splitting possible when primary key ID is non-auto-increment?