1 brief introduction

MVCC(Multi-Version Concurrency Control) Multi version concurrency control
, Is a method used to control concurrency in a database , To achieve concurrent access to the database . stay MySQL in ,MVCC Only read committed (Read Committed) and
Repeatable reading (Repeatable Read) Valid at both transaction levels . It is through Undo Version chain and in log ReadView Consistency view
To achieve .MVCC Is when multiple transactions exist at the same time ,SELECT Statement to find the specific version on the version chain , The process of returning the recorded data on the found version .

The first thing to know is , stay MySQL in , Three hidden fields will be added after our table by default :

* DB_ROW_ID
: that 's ok ID,MySQL of B+ The tree index feature requires that each table must have a primary key . If not set , Will automatically find the first one that does not contain NULL The unique index column of is used as the primary key . If you still can't find it , It'll be here DB_ROW_ID Automatically generate a unique value on , Use this as the primary key ( The column and MVCC It doesn't matter much );
* DB_TRX_ID
: affair ID, Records what the current transaction is doing INSERT or UPDATE Transaction during statement operation ID(DELETE Statements are treated as UPDATE Special case of statement , It will be explained later );
* DB_ROLL_PTR: rollback pointer , Through it, different versions can be connected in series , Form version chain . Equivalent to linked list next Pointer .

( be careful , The added hidden fields are not the creation time and deletion time that many people think , At the same time MySQL in MVCC The implementation of is not realized through any snapshot . The reason for this statement may be derived from 《 High performance MySQL》 Right in a Book MySQL in MVCC Wrong conclusion , Then it spread like everyone else ( be careful , What I've been emphasizing here is MySQL in MVCC Implementation of , Because there may be different implementations in different databases ). So looking at the source code and official documents is the most authoritative explanation )

2 ReadView

ReadView The consistency view is mainly composed of two parts : Of all uncommitted transactions ID Array and the largest transaction that has been created ID
form ( actually ReadView There are other fields , But it doesn't affect here MVCC Explanation of ). such as :[100,200],300. affair 100 and 200 Is a currently uncommitted transaction , And affairs 300 Is the largest transaction currently created ( Already submitted ). When executed SELECT Statement will be created ReadView, However, at the two transaction levels of read committed and repeatable read , generate ReadView The strategy is different :
The read committed level is once per execution SELECT Statement will regenerate a copy ReadView, The readability level is only available for the first time SELECT A copy is generated when the statement is executed , Subsequent SELECT Statement will follow the previously generated ReadView( Even if there is an update statement after it , It will continue to be used ).

3 Version chain

Only one copy of all versions of data will be saved , Then connect through the rollback pointer , Then you can find the specific version of the data through certain rules . Suppose there is one now account surface , Among them id and name Two fields , The schematic diagram of version chain is as follows :

The comparison rules of specific version chains are as follows , First, take out the transaction of the first version at the top from the version chain ID Start to compare one by one :

( among min_id point ReadView The smallest transaction in the uncommitted transaction array in ID, and max_id point ReadView The largest transaction that has been created in ID)

* If it falls in the green range (DB_TRX_ID <
min_id): This version is better than min_id Still small ( affair ID It is generated from small to large ), Explain that this version is in SELECT It was submitted before , So this data is visible . perhaps
( Here is a short circuit or , If the previous condition is not satisfied, the latter condition will be judged ) This version of the transaction itself is the current transaction SELECT If the transaction in which the statement is located , It's also visible ;
* If it falls in the red range (DB_TRX_ID > max_id): Indicates that this version is generated by transactions started in the future , Not yet started , Then it's invisible ;
* If it falls in the Yellow range (min_id <= DB_TRX_ID <= max_id): At this time, we need to judge two more situations :
* If this version of the transaction ID stay ReadView In the array of uncommitted transactions , Indicates that this version is generated by uncommitted transactions , Then it's invisible ;
* If this version of the transaction ID be not in ReadView In the array of uncommitted transactions , Indicates that this version is generated by a committed transaction , So it's visible .
If it is found in the above judgment that the current version is invisible , Then continue to take the next version from the version chain through the rollback pointer to make the above judgment .

4 Demonstration process

The following is a concrete demonstration through an example MVCC Implementation process of ( It is assumed to be at the level of repeatable read transactions ), current account There is already an initial data entry in the table (id=1,name=monkey):

 Transaction 100Transaction 200Transaction 300 No transaction ID No transaction ID
1begin;begin;begin;begin;begin;
2UPDATE test SET a='1' WHERE id = 1;     
3 UPDATE test SET a='2' WHERE id = 2;    
4  UPDATE account SET name = 'monkey301' WHERE id = 1;  
5  commit;  
6   SELECT name FROM account WHERE id = 1; 
7UPDATE account SET name = 'monkey101' WHERE id = 1;    
8UPDATE account SET name = 'monkey102' WHERE id = 1;    
9   SELECT name FROM account WHERE id = 1; 
10commit;UPDATE account SET name = 'monkey201' WHERE id = 1;   
11 UPDATE account SET name = 'monkey202' WHERE id = 1;   
12   SELECT name FROM account WHERE id = 1;SELECT name FROM account WHERE id =
1;
13 commit;   

There are five transactions from left to right , From top to bottom is the time point . Among them, in Chapter 2 and 3 Transactions at point in time 100 And affairs 200( There is a difference between the two transactions 100 Just to make it easier to see , Normally, the next transaction ID So +1 The way to create ) One was executed separately UPDATE sentence , These two statements have no practical effect , Just to generate transactions ID of , So below MVCC The impact of these two statements will not be analyzed during execution , We only study account surface . And the last two transactions , I have no business ID of .
Because business ID Is to perform an update operation ( Addition, deletion and modification ) Will be generated after the statement ( This is also a business 100 And affairs 200 First execute the meaning of an update statement ), It is not generated when the transaction is started .
In the last two transactions, we can see that some transactions are executed SELECT Just a statement , So they have no transactions ID.

First, let's take a look at the version chain and in the initial state ReadView(ReadView Not yet generated ):

Among them, transaction 1 stay account An initial data entry is created in the table .

Then on the second day 1 Time point , Five transactions are opened separately ( as it was stated , No transaction has been generated at this time ID).

In the first 2 Time point , The first transaction executes a UPDATE sentence , Transaction generated ID by 100.

In the first 3 Time point , The second transaction executed one UPDATE sentence , Transaction generated ID by 200.

In the first 4 Time point , The third transaction implements one UPDATE sentence , take account In the table id by 1 of name Change to monkey301. Transactions are generated at the same time ID by 300.

In the first 5 Time point , affair 300 That is, the above transaction is executed commit operation .

In the first 6 Time point , The fourth transaction implements one SELECT sentence , Want to check the current id by 1 Data ( as it was stated , The transaction did not generate a transaction ID). The version chain and ReadView as follows :

Because in the second 5 Time point , affair 300 already commit Yes , therefore ReadView The uncommitted transaction array of does not contain it . At this time, according to the above comparison rules , Take the transaction of the first version in the version chain ID by 300 Compare , First, the current one SELECT Statement is not in a transaction 300 Query in , Then it was found that it fell in the Yellow range , And business 300 Not at all ReadView In the array of uncommitted transactions , So it's visible . That is, at this time, on the 6 Time point , The result of the fourth transaction is monkey301.

In the first 7 Time point , affair 100 Implemented a UPDATE sentence , take account In the table id by 1 of name Change to monkey101.

In the first 8 Time point , affair 100 Another one was executed UPDATE sentence , take account In the table id by 1 of name Change to monkey102.

In the first 9 Time point , The fourth transaction implements one SELECT sentence , Want to check the current id by 1 Data . The version chain and ReadView as follows :

be careful , Because it is currently at the transaction level that can be read repeatedly , So at this time ReadView Used in section 6 Generated at time point ReadView( If it is at the level of reading committed transactions , A new copy will be generated ReadView Yes ). Then according to the above comparison rules , Take the transaction of the first version in the version chain ID by 100 Compare , First, the current one SELECT Statement is not in a transaction 100 Query in , Then it was found that it fell in the Yellow range , And business 100 Yes ReadView In the array of uncommitted transactions , So it's invisible . At this point, the next version is retrieved through the rollback pointer , Discovery transaction ID Still for 100, After analysis, it is still invisible . At this time, take the next version : affair ID by 300 Compare , First, the current one SELECT Statement is not in a transaction 300 Query in , Then it was found that it fell in the Yellow range , But business 300 Not in ReadView In the array of uncommitted transactions , So it's visible . That is, at this time, on the 9 Time point , The result found by the fourth transaction is still monkey301( This is the meaning of repeatable reading ).

In the first 10 Time point , affair 100commit The transaction has been committed . Simultaneous transaction 200 Implemented a UPDATE sentence , take account In the table id by 1 of name Change to monkey201.

In the first 11 Time point , affair 200 Another one was executed UPDATE sentence , take account In the table id by 1 of name Change to monkey202.

In the first 12 Time point , The fourth transaction implements one SELECT sentence , Want to check the current id by 1 Data . The version chain and ReadView as follows :

Heel 9 The time is the same , At the repeatable transaction level ,ReadView Used in section 6 Generated at time point ReadView. Then according to the above comparison rules , Take the transaction of the first version in the version chain ID by 200 Compare , First, the current one SELECT Statement is not in a transaction 200 Query in , Then it was found that it fell in the Yellow range , And business 200 Yes ReadView In the array of uncommitted transactions , So it's invisible . At this point, the next version is retrieved through the rollback pointer , Discovery transaction ID Still for 200, After analysis, it is still invisible . At this time, take the next version : affair ID by 100 Compare , First, the current one SELECT Statement is not in a transaction 100 Query in , Then it was found that it fell within the Yellow range , At the same time ReadView Uncommitted array of , So it's still invisible . At this time, take the next version , Discovery transaction ID Still for 100, After analysis, it is still invisible . Take the next version at this time : affair ID by 300 Compare , First, the current one SELECT Statement is not in a transaction 300 Query in , Then it was found that it fell in the Yellow range , But business 300 Not in ReadView In the array of uncommitted transactions , So it's visible . That is, at this time, on the 12 Time point , The result found by the fourth transaction is still monkey301.

At the same time 12 Time point , The fifth transaction implements an article SELECT sentence , Want to check the current id by 1 Data . The version chain and ReadView as follows :

be careful , At this time, the fifth transaction is the first one in the transaction SELECT sentence , Therefore, the current situation will be regenerated ReadView, That is, the content shown in the above figure . Can see , And the fourth transaction ReadView Not the same , Because in the previous second 10 Time point , affair 100 The transaction has been committed . Then according to the above comparison rules , Take the transaction of the first version in the version chain ID by 200 Compare , First, the current one SELECT Statement is not in a transaction 200 Query in , Then it was found that it fell in the Yellow range , And business 200 Yes ReadView In the array of uncommitted transactions , So it's invisible . At this point, the next version is retrieved through the rollback pointer , Discovery transaction ID Still for 200, After analysis, it is still invisible . At this time, take the next version : affair ID by 100 Compare , The discovery is in the green range , So it's visible . That is, at this time, on the 12 Time point , The result of the fifth transaction is monkey102( Can see , Even the same SELECT sentence , In different transactions , The query results may also be different , The reason is because ReadView Different ).

In the first 13 Time point , affair 200 Yes commit operation , End of the whole analysis process .

The above demonstration is MVCC Specific implementation process of , Under multiple transactions , Version chain and ReadView How to cooperate with the search . There is also a situation omitted above, which is not explained , If so DELETE Sentence words , The latest data will also be inserted into a version chain , Then the transaction ID Assigned to the current transaction for deletion ID. But at the same time, it will be in the information header of this record (record
header) Inside deleted_flag Mark position is true, This indicates that the current record has been deleted . So if the version found after version comparison deleted_flag Flag bit is true Words , Then it will not return , But keep looking for the next one .

in addition , If the current transaction is executed rollback If you roll back , All versions belonging to this transaction in the version chain will be deleted .

Technology