Indexes

Publish on: 2025-02-28 Classify at: MySQL

Words: 1262 Read:≈ 6mins Views:

Clustered Index, Secondary Index

Clustered Index

In InnoDB, each table has a clustered index that determines the physical order of data storage on disk.

If there is a primary key, it serves as the clustered index
If there is no primary key, choose a unique non-null index as the clustered index
If there still isn’t one, InnoDB implicitly creates an auto-increment column as the clustered index

B+ Tree

By default, InnoDB uses B+ Tree storage for both clustered and secondary indexes.

B+ trees only store data in leaf nodes, with data in each leaf node sorted by primary key order, and adjacent leaf nodes connected by bidirectional pointers. Non-leaf nodes store indexes and pointers to child nodes.

Both indexes and data are stored on hard disk, so accessing each node is one disk I/O operation. B+ trees storing tens of millions of data only need 3-4 levels, which means 3-4 disk I/O operations.

Advantages of B+ trees over B trees:

Non-leaf nodes of B+ trees can store more key values and pointers, reducing tree height and disk I/O operations.
For range queries, B trees need to traverse the entire tree; B+ trees only need to traverse the linked list of leaf nodes.
Sequential access is convenient.

Advantages of B+ trees over binary trees: In B+ trees, a node can have at most m children, where m is usually hundreds to thousands. Search time complexity is O(logN) with base m.

Secondary Index

Leaf nodes of clustered indexes store actual data, while leaf nodes of secondary indexes store primary key values.

If querying based on secondary index and the required data is more than just the primary key, then table lookup is needed, meaning querying the B+ tree twice. If only the primary key is queried, table lookup is not needed.

Primary Key Index, Unique Index, Prefix Index

1
CREATE UNIQUE INDEX xxx ON xxxtable(col1, col2)

A table can have multiple unique indexes, which must be built on UNIQUE fields.

Regular Index: Does not require fields to be PRIMARY or UNIQUE

Prefix Index: Creates an index on the first few characters of character type fields, aimed at reducing index space usage. Character types: char, varchar, binary, varbinary.

binary type is used to store byte strings and can store binary data like images and videos.

Single Column Index, Composite Index

Composite index, also called compound index, refers to creating an index on multiple fields. Columns with more unique values are usually placed first.

Leftmost Matching

Leftmost Matching: Create a compound index on (col1, col2, col3). In B+ tree storage, it first sorts by col1, then by col2 if col1 is the same, and so on.

Usable query conditions:
col1 = ?
col2 = ? AND col1 = ? (optimizer will optimize)
col1 = ? AND col2 = ? AND col3 = ?

Unusable query conditions:
col2 = ? (doesn’t start from leftmost column col1)
col3 = ? (doesn’t start from leftmost column col1)

Can compound indexes be used for range queries?

Yes, but columns after the range query column cannot use the index. For example, with index (col1, col2, col3), query WHERE col1 = 1 AND col2 > 2 AND col3 = 3, col3 cannot use the index.

This is because in records found by col1 = 1 AND col2 > 2, col3 is unordered. For example, records col1=1, col2=3, col3=5 and col1=1, col2=4, col3=2.

How to determine if a compound index is effective?

1
EXPLAIN SELECT * FROM users WHERE age > 25 and salary=50000;

If key_len is 4, representing one int, it means the salary index is not effective

In SELECT * FROM users WHERE age >= 25 and salary=50000;, which fields use the compound index?

Both are used. Although in the age>25 part, salary is unordered. But when age=25, the found salary is ordered, so it can reduce scan range and the compound index is effective.

In SELECT * FROM users WHERE age BETWEEN 20 AND 30 and salary=50000;, which fields use the compound index?

In MySQL, BETWEEN means 20<=age<=30, so both fields use the compound index.

In SELECT * FROM users WHERE name like ‘j%’ and salary=50000;, which fields of compound index (name, salary) use the compound index?

Both are used. name like ‘j%’ matches all strings starting with j. Among all records where name starts with j, salary is ordered.

Index Condition Pushdown

1
2
SELECT * FROM employees 
WHERE department_id = 10 AND hire_date BETWEEN '2024-01-01' AND '2024-12-31';

Without index condition pushdown, it first finds all employee records in department 10 based on department_id, then performs table lookup to get complete row data, then filters records with matching hire_date.

With index condition pushdown, during the index dept_hire_date_idx scan, it directly applies the condition hire_date BETWEEN ‘2024-01-01’ AND ‘2024-12-31’. This means for each index item satisfying department_id = 10, if its hire_date is not in the specified range, no table lookup operation is performed, reducing unnecessary disk I/O and memory consumption.

Index Selectivity

When using compound indexes, put fields with high index selectivity first. Index selectivity = number of unique columns / total number of columns

Index Failure

Left/left-right fuzzy matching: Because index B+ trees are sorted by index values, only right fuzzy matching can use indexes to locate a range.
Using functions on indexes, like where length(name)=6. But you can create indexes on function calculation results.
Performing expression calculations on indexes. For example, where age+1=30 doesn’t work, but where age=30-1 works. This is because InnoDB doesn’t create an index for age+1.
Implicit type conversion. MySQL automatically converts strings to integers. If stored as int and where input is varchar, cast function is called on input value and index doesn’t fail; if stored as varchar and where input is int, it’s equivalent to calling function on index, resulting in full table scan.
Compound index doesn’t follow leftmost matching.
Where condition 1 Or condition 2, but only one field is in the index.

Slow Queries

Page loading too slow, interface stress test response time exceeds 1s

SkyWalking can view response time for each interface and locate which specific SQL statement executes too slowly
MySQL built-in slow query log

1
2
3
4
#Enable MySQL slow query log switch
slow_query_log=1
#Set slow log time to 2 seconds, SQL statements taking more than 2 seconds are considered slow queries and recorded in slow query log
long_query_time=2

Index Reasons

Use EXPLAIN command to understand the reasons.

select_type (query type):

SIMPLE: Simple SELECT query (no subqueries or UNION)

type

system: Table has only one record
const: Primary key/unique index, finds exactly one record
eq_ref: Multi-table join using primary key or unique index for association
ref: Using regular index for equality query

extra

using index: Covering index
using index condition: Index condition pushdown, filtering at engine layer.
using where: Needs table lookup
using temporary: Needs extra table. (GROUP BY and ORDER BY columns are different/GROUP BY uses unindexed columns)
using filesort: Extra sorting operation. (ORDER BY uses unindexed columns/ORDER BY and WHERE use different indexes/sorting fields not in driving table during multi-table join queries)

Too Few Connections

Too few connections between client and server, causing some SQL statements to block.

MySQL server’s default maximum connections is 100, can be changed to 500;

Client applications have a connection pool, reusing connections from it each time. Just modify the ORM’s connection pool maximum count.

Buffer Pool Too Small

Check buffer pool cache hit rate, generally should be above 99%.

Tablespace

.frm files store table structure definitions; .ibd files store table data

Page Split

Why is page splitting possible when primary key ID is non-auto-increment?

Ruoke Zhang