MySQL之select in 子查询优化的实现

下面的演示基于MySQL5.7.27版本

一、关于MySQL子查询的优化策略介绍:

子查询优化策略

对于不同类型的子查询,优化器会选择不同的策略。

1. 对于 IN、=ANY 子查询,优化器有如下策略选择:

  • semijoin
  • Materialization
  • exists

2. 对于 NOT IN、<>ALL 子查询,优化器有如下策略选择:

  • Materialization
  • exists

3. 对于 derived 派生表,优化器有如下策略选择:
derived_merge,将派生表合并到外部查询中(5.7 引入 );
将派生表物化为内部临时表,再用于外部查询。
注意:update 和 delete 语句中子查询不能使用 semijoin、materialization 优化策略

二、创建数据进行模拟演示

为了方便分析问题先建两张表并插入模拟数据:

CREATE TABLE `test02` (
 `id` int(11) NOT NULL,
 `a` int(11) DEFAULT NULL,
 `b` int(11) DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `a` (`a`)
) ENGINE=InnoDB;

drop procedure idata;
delimiter ;;
create procedure idata()
begin
 declare i int;
 set i=1;
 while(i<=10000)do
  insert into test02 values(i, i, i);
  set i=i+1;
 end while;
end;;
delimiter ;
call idata();

create table test01 like test02;
insert into test01 (select * from test02 where id<=1000)

三、举例分析SQL实例

子查询示例:

SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10)

大部分人可定会简单的认为这个 SQL 会这样执行:

SELECT test02.b FROM test02 WHERE id < 10

结果:1,2,3,4,5,6,7,8,9

SELECT * FROM test01 WHERE test01.a IN (1,2,3,4,5,6,7,8,9);

但实际上 MySQL 并不是这样做的。MySQL 会将相关的外层表压到子查询中,优化器认为这样效率更高。也就是说,优化器会将上面的 SQL 改写成这样:

select * from test01 where exists(select b from test02 where id < 10 and test01.a=test02.b);

提示: 针对mysql5.5以及之前的版本

查看执行计划如下,发现这条SQL对表test01进行了全表扫描1000,效率低下:

root@localhost [dbtest01]>desc select * from test01 where exists(select b from test02 where id < 10 and test01.a=test02.b);
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type    | table | partitions | type | possible_keys | key   | key_len | ref | rows  | filtered | Extra    |
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| 1 | PRIMARY      | test01 | NULL    | ALL  | NULL     | NULL  | NULL  | NULL | 1000  |  100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | test02 | NULL    | range | PRIMARY    | PRIMARY | 4    | NULL |   9 |  10.00 | Using where |
+----+--------------------+--------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)

但是此时实际执行下面的SQL,发现也不慢啊,这不是自相矛盾嘛,别急,咱们继续往下分析:

SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10)

查看此条SQL的执行计划如下:

root@localhost [dbtest01]>desc SELECT * FROM test01 WHERE test01.a IN (SELECT test02.b FROM test02 WHERE id < 10);
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key   | key_len | ref      | rows | filtered | Extra    |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| 1 | SIMPLE    | <subquery2> | NULL    | ALL  | NULL     | NULL  | NULL  | NULL     | NULL |  100.00 | Using where |
| 1 | SIMPLE    | test01   | NULL    | ref  | a       | a    | 5    | <subquery2>.b |  1 |  100.00 | NULL    |
| 2 | MATERIALIZED | test02   | NULL    | range | PRIMARY    | PRIMARY | 4    | NULL     |  9 |  100.00 | Using where |
+----+--------------+-------------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

发现优化器使用到了策略MATERIALIZED。于是对此策略进行了资料查询和学习。

MySQL之select in 子查询优化的实现

扫一扫手机访问