数据库系列 | B-Tree 索引类型详解

开源技术小栈 2021-08-09

1502

B-Tree 索引类型详解

索引有很多种类型，可以为不同的应用场景提供更好的性能。在 MySQL 中，索引是在存储引擎层实现的。接下来重点介绍四种常见的索引类型：B-Tree 索引、哈希索引、空间数据索引（R-Tree）、全文索引。

1. B-Tree 索引

B-Tree 索引是最常见的索引之一，当大家在谈论索引的时候，如果没有特别说明，那多半说的就是 B-Tree 索引。在 MySQL 中，大多数的存储引擎都支持 B-Tree 索引。

1.1 存储结构

B-Tree 对索引列的值是按顺序存储的，并且每一个叶子页到根的距离相同。B-Tree 索引可以加快数据查找的速度，因为存储引擎不需要全表扫描来获取数据，只要从索引的根节点开始搜索即可。

以表 customer 为例，我们来看看索引是如何组织数据的存储的。

CREATE TABLE customer (
 id INT,
 last_name VARCHAR ( 30 ),
 first_name VARCHAR ( 30 ),
 birth_date date,
 gender CHAR ( 1 ),
 KEY idx1_customer ( last_name, first_name, birth_date ) 
);

如图，对于表中的每行数据，索引包含了 last_name、first_name 和 birth_date 的值。

1.2 适合 B-Tree 索引的查询类型

全值匹配

和索引中的所有列进行匹配，如查找姓名为 George Bush、1960-08-08 出生的客户。

> explain select * from customer where first_name='George' and last_name='Bush' and birth_date='1960-08-08'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | customer
partitions    | None
type          | ref
possible_keys | idx1_customer
key           | idx1_customer
key_len       | 250
ref           | const,const,const
rows          | 1
filtered      | 100.00
Extra         | None

1 row in set
Time: 0.001s

说明：用到了索引

匹配最左前缀

只使用索引的第一列，如查找所有姓氏为 Bush 的客户

mysql www@localhost:ucenter> explain select * from customer where last_name='Bush'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | customer
partitions    | None
type          | ref
possible_keys | idx1_customer
key           | idx1_customer
key_len       | 123
ref           | const
rows          | 1
filtered      | 100.00
Extra         | None

1 row in set
Time: 0.001s

说明：用到了索引。连接类型：范围查找

匹配列前缀

只匹配某一列的值的开头部分，如查找所有以 B 开头的姓氏的客户，这里使用了索引的第一列：

mysql> explain select * from customer where last_name like 'B%'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | customer
partitions    | None
type          | range
possible_keys | idx1_customer
key           | idx1_customer
key_len       | 123
ref           | None
rows          | 1
filtered      | 100.00
Extra         | Using index condition

1 row in set

说明：用到了索引。连接类型：范围查找

匹配范围值

查找所有姓氏在 Allen 和 Bush 之间的客户，这里使用了索引的第一列：

> explain select * from customer where last_name between 'Allen' and 'Bush'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | customer
partitions    | None
type          | range
possible_keys | idx1_customer
key           | idx1_customer
key_len       | 123
ref           | None
rows          | 1
filtered      | 100.00
Extra         | Using index condition

1 row in set
Time: 0.001s

精确匹配某一列，并范围匹配另一列

第一列全匹配，第二列范围匹配，如查找姓氏为 Bush，名字以 G 开头的客户：


只需要访问索引即可获取数据，不需要回表访问数据行，这种查询也叫覆盖索引：
> explain select * from customer where last_name='Bush' and first_name like 'G'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | customer
partitions    | None
type          | range
possible_keys | idx1_customer
key           | idx1_customer
key_len       | 246
ref           | None
rows          | 1
filtered      | 100.00
Extra         | Using index condition

1 row in set
Time: 0.001s

只访问索引的查询

只需要访问索引即可获取数据，不需要回表访问数据行，这种查询也叫覆盖索引：

> explain select last_name from customer where last_name='Bush'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | customer
partitions    | None
type          | ref
possible_keys | idx1_customer
key           | idx1_customer
key_len       | 123
ref           | const
rows          | 1
filtered      | 100.00
Extra         | Using index

1 row in set
Time: 0.001s

除了上述这些查询类型外，索引还可以用于 order by 排序操作，因为索引中的节点是有序的。如果 B-Tree 可以按照某种方式查找到数据，那么也可以按照这种方式进行排序。

1.3 B-Tree 索引的限制

可能无法使用到索引

（1）如果不是按照索引的最左列开始查找数据，则无法使用索引。如查找名字为 George 的客户：

mysql www@localhost:ucenter> explain select * from customer where first_name='George'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | customer
partitions    | None
type          | ALL
possible_keys | None
key           | None
key_len       | None
ref           | None
rows          | 1
filtered      | 100.00
Extra         | Using where

1 row in set
Time: 0.001s

（2）不能跳过索引的列。如查找姓氏为 Bush，生日为 1960-08-08 的客户，这种查询只能使用索引的第一列：

> explain select * from customer where last_name='Bush' and birth_date='1960-08-08'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | customer
partitions    | None
type          | ref
possible_keys | idx1_customer
key           | idx1_customer
key_len       | 123
ref           | const
rows          | 1
filtered      | 100.00
Extra         | Using index condition

1 row in set
Time: 0.001

（3）如果查询中有某个列的范围查询，在其右边的列都无法使用索引进行查找数据。如查找姓氏为以 B 开头，名字为 George 的客户。这个查询只能使用第一列，因为 like 是一个范围查询：

> explain select * from customer where last_name like 'B%' and first_name='George'\G;
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | customer
partitions    | None
type          | range
possible_keys | idx1_customer
key           | idx1_customer
key_len       | 246
ref           | None
rows          | 1
filtered      | 100.00
Extra         | Using index condition

1 row in set

其他问题

1. 问：为什么索引结构默认使用B-Tree，而不是hash，二叉树，红黑树？

hash：虽然可以快速定位，但是没有顺序，IO复杂度高。
二叉树：树的高度不均匀，不能自平衡，查找效率跟数据有关（树的高度），并且IO代价高。
红黑树：树的高度随着数据量增加而增加，IO代价高。

2. 问：为什么官方建议使用自增长主键作为索引。

结合B+Tree的特点，自增主键是连续的，在插入过程中尽量减少页分裂，即使要进行页分裂，也只会分裂很少一部分。并且能减少数据的移动，每次插入都是插入到最后。总之就是减少分裂和移动的频率。

数据库

文章转载自开源技术小栈，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。