Necessary for back-end programmers ： Conquer order by - Blog

[{"createTime":1735734952000,"id":1,"img":"hwy_ms_500_252.jpeg","link":"https://activity.huaweicloud.com/cps.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=V1g3MDY4NTY=&utm_medium=cps&utm_campaign=201905","name":"华为云秒杀","status":9,"txt":"华为云38元秒杀","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"new_ads.png","link":"https://www.iodraw.com/ads","name":"发布广告","status":9,"txt":"发布广告","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]

<> preface

In daily development , We often use order by, Dear little friend , Do you know order by How does it work ?order by What is the optimization idea of ? use order
by What should we pay attention to ? This article will learn with you , Conquer order by~

<> One use order by Simple example of

Suppose an employee table is used , The table structure is as follows ：
CREATE TABLE `staff` ( `id` BIGINT ( 11 ) AUTO_INCREMENT COMMENT ' Primary key id',
`id_card` VARCHAR ( 20 ) NOT NULL COMMENT ' ID number ', `name` VARCHAR ( 64 ) NOT
NULL COMMENT ' full name ', `age` INT ( 4 ) NOT NULL COMMENT ' Age ', `city` VARCHAR ( 64 )
NOT NULL COMMENT ' city ', PRIMARY KEY ( `id`), INDEX idx_city ( `city` ) ) ENGINE
= INNODB COMMENT ' Employee table ';
Table data are as follows ：

We have such a need now ： Before query 10 individual , Name of employee from Shenzhen , Age , city , And sorted by age . Corresponding SQL Statement can be written like this ：
select name,age,city from staff where city = ' Shenzhen ' order by age limit 10;
The logic of this statement is very clear , But what is its underlying execution process ?

<>order by working principle

<>explain Execution plan

Let's use it first Explain Keyword check the execution plan

* Implement the plan key This field , Indicates that the index is used idx_city
* Extra Of this field Using index condition Indicates the index condition
* Extra Of this field Using filesort Indicates that sorting is used
We can find , This one SQL Index used , And sorting is also used . So how is it sorted ?

<> Full field sort

MySQL A small memory will be allocated to each query thread , For sorting , be called sort_buffer. When to put the fields into sorting , Actually through idx_city
Index to find the corresponding data , Just put the data in .

Let's review how the index finds matching data , Now draw the index tree first ,idx_city The index tree is as follows ：

idx_city Index tree , Leaf nodes store primary keys id. There's another one id Primary key clustering index tree , Let's draw a family index tree again ：

How do our query statements find matching data ? Pass first idx_city Index tree , Find the corresponding primary key id, And then get the primary key id, search id Primary key index tree , Find the corresponding row data .

add order by after , The overall implementation process is ：

* MySQL Initialize for the corresponding thread sort_buffer, Put in the required query name,age,city field ;
* From index tree idx_city, Find the first satisfaction city=' Shenzhen ’ Primary key of condition id, That is, in the figure id=9;
* To primary key id Get the index tree id=9 Data in this row of , take name,age,city Values of three fields , Save to sort_buffer;
* From index tree idx_city Get the primary key of the next record id, That is, in the figure id=13;
* Repeat step 3,4 until city The value of is not equal to Shenzhen ;
* front 5 Step has found all city Data for Shenzhen , stay sort_buffer in , Put all data according to age Sort ;
* Take the top according to the sorting result 10 Line returned to client .
The execution diagram is as follows ：

Read all the fields required by the query to sort_buffer in , Full field sorting
. In here , Some friends may have a question , Put all the fields of the query into sort_buffer, and sort_buffer It's a piece of memory , If the amount of data is too large ,sort_buffer What if I can't let go ?

<> Disk temporary file secondary sort

actually ,sort_buffer The size of is controlled by a parameter ：sort_buffer_size. If the data to be sorted is less than sort_buffer_size, Sort in
sort_buffer Complete in memory , If the data to be sorted is greater than sort_buffer_size, Then sort with the help of disk files

How to determine whether disk files are used for sorting ? You can use the following commands
## open optimizer_trace, Turn on statistics set optimizer_trace = "enabled=on"; ## implement SQL sentence
select name,age,city from staff where city = ' Shenzhen ' order by age limit 10; ##
Query output statistics select * from information_schema.optimizer_trace
Can be from number_of_tmp_files See in , Whether temporary files are used .

number_of_tmp_files Indicates the number of temporary disk files sorted using . If number_of_tmp_files>0, Indicates that disk files are used for sorting .

Disk temporary file used , What about the whole sorting process ?

* From primary key Id Index tree , Get the data you need , And put sort_buffer Memory
In block . When sort_buffer When it is almost full , Right sort_buffer Data sorting in , After platoon , Temporarily put the data into a small file on the disk .
* Continue back to primary key id Index tree fetching data , Continue to put sort_buffer In memory , After sorting , Also write these data to the temporary small file on the disk .
* Continue the cycle , Until all the data that meet the conditions are retrieved . Finally, a small file that temporarily arranges the disk , Merge into an orderly large file .
TPS: Sort temporary small files with disk , In fact, the merging sorting algorithm is used .

My friends may have a question , since sort_buffer
can't let go , You need to use temporary disk files , This will affect the sorting efficiency . Then why sort irrelevant fields （name,city） put to sort_buffer Medium ? Only sort related age field , it
Doesn't it smell good ? You can understand rowid sort .

<>rowid sort

rowid Sort is , Query only SQL Fields and primary keys required for sorting id, put to sort_buffer in . So how do you decide whether to use full field sorting or rowid Sort sort ?

Actually, there is a parameter control . This parameter is max_length_for_sort_data
, It means MySQL A parameter used to sort the length of row data , If the length of a single line exceeds this value ,MySQL I think the single line is too big , Just change rowid sort . We can see the value of this parameter through the command .
show variables like 'max_length_for_sort_data';

max_length_for_sort_data The default value is 1024. Because in this example name,age,city length =64+4+64 =132 < 1024,
So it's full field sorting . Let's change this parameter , Make it smaller ,
## The maximum single row length of modified sorting data is 32 set max_length_for_sort_data = 32; ## Execute query SQL select
name,age,city from staff where city = ' Shenzhen ' order by age limit 10;
use rowid Sort words , whole SQL What is the implementation process ?

* MySQL Initialize for the corresponding thread sort_buffer, Put in the to be sorted age field , And primary key id;
* From index tree idx_city, Find the first satisfaction city=' Shenzhen ’ Primary key of condition id, That is, in the figure id=9;
* To primary key id Get the index tree id=9 Data in this row of , take age And primary key id Value of , Save to sort_buffer;
* From index tree idx_city Get the primary key of the next record id, That is, in the figure id=13;
* Repeat step 3,4 until city The value of is not equal to Shenzhen ;
* front 5 Step has found all city Data for Shenzhen , stay sort_buffer in , Put all data according to age Sort ;
* Traversal sorting results , Take before 10 that 's ok , And according to id The value of is returned to the original table , take out city,name and age Three fields are returned to the client .
The execution diagram is as follows ：

Compare the process of full field sorting ,rowid Return to the table after sorting one more time .

What is return table ? Get the primary key and return to the process of primary key index query , It's called back to the table

We passed optimizer_trace, You can see whether it is used rowid Sorted ：
## open optimizer_trace, Turn on statistics set optimizer_trace = "enabled=on"; ## implement SQL sentence
select name,age,city from staff where city = ' Shenzhen ' order by age limit 10; ##
Query output statistics select * from information_schema.optimizer_trace

<> Full field sorting and rowid Sort comparison

* Full field sort ： sort_buffer If there is not enough memory , Disk temporary files are needed , Cause disk access .
* rowid sort ： sort_buffer Can put more data , But you need to go back to the original table to get data , Return to the table once more than full field sorting .
Normally , about InnoDB Storage engine , Priority will be given to full field sorting . Can be found max_length_for_sort_data
The parameter is set to 1024, This number is relatively large . Normally , The sorting field will not exceed this value , That is to say, you can sort all fields .

<>order by Some optimization ideas of

How do we optimize order by What about sentences ?

* Because the data is out of order , So you need to sort . If the data itself is ordered , Then there's no need to row . The index data itself is ordered , We build a joint index , optimization order by sentence .
* We can also adjust max_length_for_sort_data Isoparametric optimization ;
<> Joint index optimization

Review the example again SQL Query plan for
explain select name,age,city from staff where city = ' Shenzhen ' order by age limit
10;

We give query conditions city And sort fields age, Add a union index idx_city_age. Check the implementation plan again
alter table staff add index idx_city_age(city,age); explain select
name,age,city from staff where city = ' Shenzhen ' order by age limit 10;

Can be found , add idx_city_age Joint index , No need Using filesort Sorted . Why? ? Because the index itself is ordered , We can have a look
idx_city_age Schematic diagram of joint index , as follows ：

whole SQL The execution process turns into maozi ：

* From index idx_city_age Find satisfaction city=' Shenzhen ’ Primary key of id
* To primary key id Index takes out the whole row , Get name,city,age Values of three fields , Return directly as part of the result set
* From index idx_city_age Remove the primary key of a record id
* Repeat step 2,3, Until No 10 Records , Or dissatisfied city=' Shenzhen ’ Cycle ends when condition .
The flow diagram is as follows ：

From the schematic diagram , There is still a table return operation . For this example , Is there a more efficient solution ? yes , we have , Can use overlay index ：

Overlay index ： In the data column of the query , There is no need to go back to the table to check , You can get the desired result directly from the index column . let me put it another way , you SQL Index column data used , Columns that overwrite query results , Even if the index is overwritten .

We give city,name,age Form a joint index , The overlay index is available , At this time SQL When executing , Even the operation of returning to the table can be omitted .

<> Adjustment parameter optimization

We can also adjust parameters , De optimization order
by Implementation of . For example, it can be adjusted sort_buffer_size Value of . because sort_buffer Value is too small , If there is a large amount of data , Will sort with the help of temporary files on disk . If MySQL If the server configuration is high , You can use a slightly larger adjustment .

We can also adjust max_length_for_sort_data Value of , If this value is too small ,order
by Can walk rowid sort , Return table , Reduce query performance . therefore max_length_for_sort_data It can be a little larger .

of course , Many times , these MySQL Parameter value , We can just use the default value directly .

<> use order by Some points for attention

<> No, where condition ,order by Does the field need to be indexed

During daily development , May we meet where Conditional order by, that , At this time order
by Do the following fields need to be quoted . If there is such a SQL,create_time Whether to add index ：
select * from A order by create_time;

Unconditional inquiry , even if create_time Index on , Will not be used . because MySQL The optimizer thinks that ordinary secondary indexes should be used , The cost of going back to the table is higher than that of scanning and sorting the whole table . So choose to scan the whole table , Then sort according to the full field or rowid Sort .

If query SQL Modify it ：
select * from A order by create_time limit m;
*
Unconditional query , If m Smaller value , It can be indexed . because MySQL The optimizer believes that , Check the data in the table according to the index order , Then get m Pieces of data , You can terminate the cycle , Then the cost is smaller than that of full table scanning , Then select secondary index .
<> paging limit When it is too large , What if a large number of sorting will be caused ?

hypothesis SQL as follows ：
select * from A order by a limit 100000,10
* Can record the last of the previous page id, When querying on the next page , Query criteria with id, as ： where id > Last of previous page id limit 10.
* It can also be used when the business allows , Limit pages .
<> Index storage order and order by atypism , How to optimize ?

Suppose there is a joint index idx_age_name, We need to change it to this ： Before query 10 Names of employees , Age , And sorted by age , If the age is the same , In descending order by name . Corresponding
SQL Statement can be written like this ：
select name,age from staff order by age ,name desc limit 10;
Let's look at the implementation plan , Found to use Using filesort.

that is because ,idx_age_name In the index tree ,age Sort from small to large , If age identical , Press again name Sort from small to large . and order by in , Yes press age Sort from small to large , If
age identical , Press again name Sort from large to small . in other words , Index storage order and order by atypism .

How can we optimize it ? If MySQL yes 8.0 edition , support Descending Indexes, You can modify the index in this way ：
CREATE TABLE `staff` ( `id` bigint(11) NOT NULL AUTO_INCREMENT COMMENT ' Primary key id',
`id_card` varchar(20) NOT NULL COMMENT ' ID number ', `name` varchar(64) NOT NULL
COMMENT ' full name ', `age` int(4) NOT NULL COMMENT ' Age ', `city` varchar(64) NOT NULL
COMMENT ' city ', PRIMARY KEY (`id`), KEY `idx_age_name` (`age`,`name` desc) USING
BTREE ) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 COMMENT=' Employee table ';
<> Used in When conditions have multiple attributes ,SQL Whether there is a sorting process in execution

If we have a joint index idx_city_name, Execute this SQL If , You won't go through the sorting process , as follows ：
select * from staff where city in (' Shenzhen ') order by age limit 10;

however , If used in condition , And there are multiple conditions , There will be a sorting process .
explain select * from staff where city in (' Shenzhen ',' Shanghai ') order by age limit 10;

that is because :in There are two conditions , When meeting Shenzhen ,age It's in order , But to meet the needs of Shanghai age Add in , There is no guarantee that all the age They are all in good order . Therefore, it is necessary to Using
filesort.

<> last

If you think this article is helpful to you , Remember to like it + Collection + forward !!!

Technology

Java296 blogs
Python265 blogs
Vue125 blogs
C Language122 blogs
Algorithm108 blogs
MySQL96 blogs
Flow Chart85 blogs
JavaScript79 blogs
More...