<>MMKV The principle and implementation of virtual reality ( one )

Talking about lightweight data persistence , The first thing you think of is SharedPreferences( hereinafter referred to as SP) It's over ,SP The storage mode is xml, Direct use I/O Read and write files by stream , This creates a drawback : Every time you write or modify, you need to replace the original data , And all the data
Rewrite file . as one can imagine , If one sp There is too much content in the file , Then it will cause a jam when writing again , There will even be ANR The risk of .

<> one ,I/O

<>1, Have a look first SP How does it work

The virtual memory is divided into two parts by the operating system : User space and kernel space , User space is where user program code runs , Kernel space is where kernel code runs . For safety , They are isolated , Even if the user's program crashes , The kernel is also unaffected .

<>2, use I/O Process of writing files

1, call write, Tell the kernel the start address and length of the data to be written

2, The kernel copies the data to the kernel cache

3, Called by the operating system , Copy data to disk , Write complete

so , Writing data to a file requires copying the data twice , And then write it to the file , If the amount of data is too large , There will also be a large performance loss .

<> two ,MMKV

<>1, What is? MMKV

In order to solve the above problems , Wechat team based on MMAP Developed MMKV Instead SP.

MMKV It's based on mmap Memory mapped key-value assembly , Underlying serialization / Deserialization use protobuf realization , High performance , Strong stability . from 2015
It has been used on wechat since the middle of the year , Its performance and stability have been verified by time . Recently, it has been transplanted to Android / macOS / Windows platform , Open source together .

<>2,MMAP(memory mapping)

Linux By associating a virtual memory area with an object on a disk , To initialize the contents of this virtual memory area , This process is called memory mapping (memory mapping).

Mapping files , The address space is allocated in the virtual memory of the process , Create mapping relationship . After realizing this mapping relationship , You can use the pointer to read and write this section of memory , The system will automatically write back to the corresponding file disk .

<>3,MMAP advantage

* MMAP The operation of reading and writing files only needs one data copy process from disk to user's main memory , Reduce the number of copies of data , Improve the efficiency of file reading and writing .
* MMAP Mapping disk files using logical memory , Operating memory is equivalent to operating files , There is no need to turn on the thread , operation MMAP Is as fast as operating memory ;
* MMAP Provides a block of memory that can be written at any time ,App Just write in the data , By the operating system, such as insufficient memory , When the process exits, it is responsible for writing the memory back to the file , Don't worry crash
Cause data loss .
Let's make a comparison SP can MMKV Colleague storage 1000 Time consuming of data :

thus it can be seen ,MMKV The write speed of is much faster than that of SP! It's the same MMAP What's your advantage .
( When reading, both are in the app During initialization, the data is saved in a map in , Read from memory , So there is no difference in the reading speed between the two .)

<> three ,Protobuf agreement

<>1, What is? protobuf agreement

protobuf yes google An open source serialization framework , similar xml,json, The biggest feature is based on binary , More than traditional XML The same paragraph is much shorter .

MMKV Officially based on protobuf Protocol for data storage , The storage mode is incremental update , That is, you don't need to write all the data to the file again every time you modify the data .

<>2, data structure

protobuf Is the binary storage format , The first representative is key and value Total length of , It's in the back key length ->key, value length ->value.....
In turn , You can take a look with the binary view tool :

<>3, Write mode

For the convenience of understanding , Here is an example of integer coding :

1 Save bytes 7 Bit data , The third 1 Bit is the flag bit

If the data to be written <= 0x7f Then use 7 Place is 1 Byte representation , Complete coding

If the data to be written > 0x7f So record low first 7 Bit data , And set the highest position to 1, Continue with judgment

16 Base system 0x7f
10 Base system 127
2 Base system 0111 1111

The above code is the way to write integers .

<>4, give an example

In order to better explain the principle of encoding and decoding , Let's take an integer to calculate : 318242

<> code

First of all 318242 Binary conversion

* code 318242 -》
0100 1101 1011 0010 0010 ( step 1)

* greater than 0x7f, Take the lowest 7 position , Highest complement 1:
1010 0010 ------------》 Write to file

* Will step 1 Move data to the right 7 position
0000 0000 1001 1011 0110 ( step 2)

* Take the lower 7 position , Highest complement 1
1011 0110 ------------》 Write to file

* Next step 2 Move data to the right 7 position
0000 0000 0000 0001 0011( step 3)

* Take the lower 7 position , Highest complement 1
1001 0011 ------------》 Write to file

* Next step 3 Move data to the right 7 position
0000 0000 0000 0000 0000 ( step 4)

* Take the lower 7 position , Because of this 7 Bit less than 0x7f, There is no need to make up 1, Write directly
0000 0000 ------------》 Write to file

After the above 8 Three steps ,protobuf Just for 318242 The encoding is complete .

<> decode

Take the data written above :

* 1010 0010
* 1011 0110
* 1001 0011
* 0000 0000
And then from the back to the front :( Pay attention , Because before 3 Data only after 7 Bits are valid data , Remove the first part when splicing )

1) take 4 Splice to 3 front :

0000 0000 001 0011

2) And then splice the data above into the 2 front :

0000 0000 001 0011 011 0110

3) And then put it together 1 front :

0000 0000 001 0011 011 0110 010 0010

Remove invalid bits :

0 001 0011 011 0110 010 0010

So we get the original binary code (0100 1101 1011 0010 0010).

<> four , shortcoming

Everything has two sides ,MMKV It also has its disadvantages .

It's a simple simulation MMKV The process of storing data . It can be seen from the above ,Linux Paging is used to manage memory , To save data, you need to create a file , And to assign a fixed size to this file . If you store a small amount of data , Then the rest of the file's memory will be wasted . On the contrary, if the stored data is larger than the file , Dynamic expansion is needed .

However, the specific situation should be analyzed in detail , In most cases, the advantages outweigh the disadvantages !

Technology