1. Idea of cache preheating

a. Give in advance redis Partial data embedded in , Re provision of services

b. It is definitely impossible to write all data redis, Because the amount of data is too large , First, it took too long , second redis Can't hold all the data

c. Need to know more about the specific visit situation of the day , Try counting the heat data with higher frequency

d. Then write the hot data with higher access frequency to redis, There must be a lot of heat data , We also need multiple services to read and write data in parallel , Parallel distributed cache preheating

e. Then the embedded thermal data redis External service provision , This will prevent cold start , Directly let the database crash

Specific real-time solutions :

1. nginx+lua Check in traffic kafka in

What is the latest real-time thermal data to be counted , We have to set the traffic corresponding to the request for access to the product details page , journal , Real time check-in kafka in ,

2. storm from kafka Medium consumption data , Count the access times of each product in real time , Visits based on LRU Storage scheme of memory data structure

Preferentially use one in memory LRUMap To store , High performance , And no external dependencies

Otherwise , rely on redis, We just want to prevent reids Hang up data loss , It's not appropriate ; use mysql, Unable to handle high concurrent reads and writes ; use hbase,hadoop ecosystem , Maintenance trouble , It's too heavy , In fact, we only need to count the most frequently visited products over a period of time , They are then counted for access , Maintain a front N Most visited items list that will do

Calculate each task Approximate number of product visits to store , Calculate size , Then build a LURMap,apache commons
collections With open source implementation , Set map Maximum size of , Will automatically LRU Algorithm to eliminate redundant data , Guaranteed memory usage limit , Even if some of the data is killed , And then start the technology again next time , It doesn't matter , Because if he was LRU Algorithm kill , Then it is not thermal data , It indicates that there have been few visits recently ,

3. each storm task At startup , be based on zk Distributed lock , Put your own id write in zk In a node of

4. each storm task Responsible for completing the statistics of thermal data here , For example, after each count , Maintain a money 1000 Of items list, Update this after each calculation list

5. Write a background thread , Every period of time , Like a minute , Rank money 1000 Thermal data for list, Sync to zk in