One , preface

Caching allows slow pages to open , Become able to “ Second on ”. Usual visit APP Almost all of them are related to the use of cache .

that , Caching can speed up data access , What's the point ?

in addition , Everything has two sides , How can we give full play to the advantages of caching , At the same time, how to avoid its disadvantages ?

This article will share with you how to understand caching , And how to use it , Hope to inspire you .

Two , What can cache do ?

As I said before , The most common understanding is that when we encounter a page opening very slowly , You think about introducing caching , So the page opens quickly . In fact, fast and slow are relative , From a technical point of view , Caching is fast because caching is based on memory , The read and write speed of memory is many times faster than that of hard disk , So using memory instead of disk as the medium of reading and writing can greatly improve the speed of accessing data .

This process is roughly like this , By storing the accessed data in memory for subsequent access , In order to achieve the effect of speed .

In fact, other than that , There are two other important ways to use caching : Pre read and delayed write .

Three , Pre read

Pre reading is to read the data to be loaded in advance , It can also be called “ Pre storage preheating ”, It is in the system first part of the hard disk is loaded into memory , And then provide services to the outside world .

Why do you do that ? Because some systems will face tens of thousands of requests once they are started , If these requests are directly sent to the database , It's very likely that the pressure on the database is increasing , It's just going to be taken down , Unable to respond normally .

In order to alleviate this problem , You need to pass “ Pre read ” To solve the problem .

Maybe you can play , Can't even use the cache ? It's time to scale out and load balance , This is not what this article discusses , Let's share it when we have a chance .

if “ Pre read ” It's in “ Data export ” If a pre buffer is added , So what's next “ Delay writing ” That's right “ Data entry ” After that, a post buffer is added .

Four , Delay writing

You may know , The write speed of the database is slower than the read speed , Because when writing, there are a series of mechanisms to ensure the accuracy of data . therefore , If you want to improve the writing speed , Or do sub database sub table , Or it's a buffer through the cache , Write to disk in batch once more , To speed up .

Due to the side effect of database and table division on cross table operation and multi condition combination query is huge , So the complexity of introducing it is greater than that of introducing cache , We should give priority to the scheme of introducing cache .

that , Acceleration through caching mechanism “ write ” This process can be called “ Delay writing ”, It is the data that needs to be written to disk or database in advance , Write to memory temporarily , Then return to success , Then write the data in memory to disk in batches .

Maybe you think , Write to memory as success , In case of an accident on the way , power failure , The abnormal termination of the program caused by shutdown, etc , Isn't the data lost ?

yes , therefore “ Delay writing ” Generally, it is only used in scenarios where data integrity requirements are not so stringent , Like the likes , The number of participating users and so on , Can greatly ease the pressure of frequent modification of the database .

In fact, in our well-known distributed cache Redis in , The persistence mechanism used by default ---RDB, This is also the way of thinking .

In a mature system , The place that can be applied to cache is not one . Now let's sort out where we can add more “ cache ”.

Five , Where can I add cache ?

Before we say where to add cache, let's find out one thing , What should we cache ? That is to say, the data in line with what characteristics need to be cached ? After all, caching is an extra cost , Value for money .

Generally speaking, you can measure it by two standards :

Hot data : Be visited by high frequency , Such as dozens of times / Seconds or more .

Static data : Little change , Reading is greater than writing .

Next, we can find a suitable place for them to cache .

The essence of caching is a problem “ Defensive ” The mechanism of , The data flow between systems is an orderly process , therefore , Choosing where to add the cache is equivalent to choosing where to set the roadblock on a road . The road behind this barrier can be protected , Not to be crushed by traffic .

So in the end user as a starting point , The database used by the system is the end point of the road, which can be used as the cache setting point. The locations are as follows :

Each set-up point can block some traffic , Finally, a funnel-shaped interception effect is formed , In order to protect the last system and the final database .

The following is a brief description of each application scenario and points that need attention .

Six , Cache category

1, Browser cache

This is the nearest place to the user that can be used as a cache , And with the help of the user's “ resources ”( The cached data is on the user's terminal device ), Cost performance is the best , Let users help you share the pressure .

When you open the browser's developer tool , notice from cache perhaps from memory cache,from disk cache
When I was young , It means that the data has been cached in the user's terminal device , That's why you can access part of the content when you don't have a network .

This process is done for us by the browser , Generally used to cache images ,js And css These resources , We can go through it Http In the header Cache-Control
To control it , The details will not be expanded here . in addition ,js Global variables in ,cookie And other applications also belong to this category .

Browser cache is the cache point on the user side , So we have less control over it , In the absence of a new request , You can't take the initiative to update the data .

2,CDN cache

provide CDN Service provider , A large number of server nodes are deployed in the whole country or even the whole world ( It can be called “ Edge server ”). Then distribute the data to these servers all over the world as a cache , Let users access the cache data on the nearest server , Can play a pressure sharing and acceleration effect . Here it is TOC The system application of type , The effect is particularly remarkable .

But it should be noted that , Due to the large number of nodes , Updating cache data is slow , Generally at least minutes , Therefore, it is generally only applicable to static data that does not change frequently .

There are also solutions , That's right url It is followed by a self increasing number or a unique mark , as ?v=1001. Because it's different url Will be treated as “ new ” Data and files for , Be renewed create come out .

3, gateway ( agent ) cache

Most of the time, we add a layer of gateway in front of the source station , The purpose is to make some security mechanisms or as the entrance of a unified streaming policy .

At the same time, it is also a good place for caching , After all, gateways are “ Nothing to do with business ” Of , It can stop and ask , Also has the very big benefit to behind source station , A lot less CPU operation .

Common gateway caches are Varnish,Squid And Ngnix. Normally , Simple cache application scenarios , use Ngnix that will do , Because most of the time we use it for load balancing , One less technology is one less complexity . If it is a large number of small files can be used Varnish, and  Squid
It is relatively large and complete , The application cost is also higher .

4, In process caching

Maybe this is the first time most of our programmers deliberately use caching .

A request can come here to show that it is “ Business related ” Of , It needs the operation of business logic .

That's why , Starting from here, the introduction of caching costs more than before 3 The number of species has increased greatly , Because of the conflict between cache and database “ Data consistency ” It's more demanding .

5, Out of process caching

We are familiar with this , namely Redis And Memcached Or something , You can even write your own program to store the cache data , For other programs to call remotely .

Let's say a few more words about it Redis and Memcached How to choose .

On resources (CPU, Memory, etc ) If you pay special attention to the utilization rate, you can use it Memcached , But the program needs to tolerate the possible data loss when it is used , Because of the pure memory mechanism . If you can't stand this one , And the utilization rate of resources is also relatively bold, then it can be used Redis. and Redis More database structure , Memcached
only key-value, It's more like a NoSQL storage .

6, Database cache

The database itself has its own cache module , Otherwise, it won't be called memory killer , Basically, it can eat as much memory as you give it . Database cache is the internal mechanism of database , Generally, the configuration of setting the size of cache space will be given to allow you to intervene .

last , In fact, the disk itself has a cache . So you'll find out , In order to make the data write to the physical disk smoothly, it's really twists and turns .

Seven , Is cache a silver bullet ?

Maybe you want to cache as well , Then more is better , As long as it's slow, cache it ?

No matter how good a thing looks , It also has its negative side , Caching also has a number of side effects to consider . Except for the above “ Cache update ” and “ Cache and data consistency ” problem , There are also such problems as the following :

1, Cache avalanche

When a large number of requests enter concurrently , Due to some reasons, it did not play the expected buffer effect , Even for a short period of time , Causes all requests to go to the database , The database is under too much pressure . It can be solved by “ Lock queue ” perhaps “ Cache time increases by a random value ”.

2, Cache penetration

It's like a cache avalanche , The difference is that it will last longer , Because every time “cache miss” It is still unable to load data from the data source to the cache , Lead to continuous production “cache
miss”. It can be solved by “ Bloom filter ” perhaps “ Cache empty objects ”.

3, Cache concurrency

A cache key At the same time set, How to ensure the accuracy of business ? Plus the database ? In process caching , How about using both out of process cache and database ? To sum up the proposal in one sentence : use “ before DB Cache again ” The way , And cache operation delete instead of set.

4, Cache bottomless hole

Although distributed caching can scale out infinitely , however , Is it true that the more nodes in a cluster, the better ? Of course not , Caching is also consistent “ Diminishing marginal utility ” Regular .

5, Cache elimination

Memory is always limited , If there is a large amount of data , So it is necessary to customize the reasonable elimination strategy according to the specific scene ,  as LRU,LFU And FIFO wait .

Eight , summary

This paper introduces three ways of using cache , Then it combs several locations where cache can be set up in a complete system , And shared about the browser ,CDN And gateway ( agent ) Some practical experience of cache , There are no specific details , I just hope we have a more systematic understanding of caching , Hope to make us more comprehensive .


Every blog is an experience , The trace of programming career , Knowledge change destiny , You have to control your own destiny , May you travel half your life , Return is still young .

More haste, less speed. , To achieve is to speed !