Cache penetration, cache breakdown and cache avalanche solutions

1. Preface

Cache is used in program design. The front-end sends data access request to the background

case 1: first, the data is retrieved from the cache and returned to the front end directly

case 2: if the data is not retrieved from the cache, the data will be retrieved from the database. After the data is retrieved, the cache will be updated first and then returned to the front end

case 3: if it is not found in the database, it will be returned to null directly.

2.Cache penetration [penetration cache, database, no data]

definition: cache penetration refers to the fact that there is no data in the cache and database, but the user constantly initiates requests, such as data with ID of “- 1” or data with ID of extra large and nonexistent. At this time, the user is likely to be an attacker, and the attack will lead to excessive pressure on the database.

solutions:

1) The verification is added in the interface layer. For example: ① user authentication verification, ② ID basic verification, ID & lt; = 0 direct interception and return.

2) Use temporary caching mechanism. If neither the cache nor the database can be retrieved, the key value pair can be written as key null, and a shorter cache validity time can be set (for example, 30 seconds. If the cache validity time is set too long, it may lead to the failure of normal use). In this way, users can be prevented from repeatedly using the same ID to brute force query attacks.

3.Cache breakdown [breakdown cache, can be found in database]

definition: cache breakdown refers to the fact that there is no data in the cache and there is data in the database (generally, the cache time is expired). At this time, because there are too many concurrent users, they can not read the data in the cache at the same time, and they go to the database to get the data at the same time, resulting in an instant increase in the pressure on the database.

solutions:

1) Hotspot data is set to never expire.

2) Add mutex lock to synchronize query operation. The reference code is as follows.

static Lock reenLock = new ReentrantLock();
   public List<String> getData() throws InterruptedException {
       List<String> result = new ArrayList<String>();
     
       // Fetching from the cache
       result = getDataFromCache();

       if (result.isEmpty()) {
           if (reenLock.tryLock()) {
               try {
                   System.out.println("Get the lock, fetch the database from the DB and write it to the cache");
                   // fetch data from database
                   result = getDataFromDB();

                   // Write the query data to the cache
                   setDataToCache(result);

               } finally {
                   reenLock.unlock();// Release the lock
               }

           } else {
               result = getDataFromCache();// check the cache again first
               
               if (result.isEmpty()) {
                   System.out.println("No lock, no data in cache, waiting...") ;
                   Thread.sleep(100);//wait
                   return getData();//retry
               }

           }
       }

       return result;

   }

Note:

1) If there is data in the cache, the result will be returned directly.

2) If there is no data in the cache, get the lock and get the data from the database. Before releasing the lock, other parallel threads will wait for 100ms, and then go to the cache again to get the data. In this way, we can prevent the database from repeatedly fetching data and updating data in the cache.

3) Of course, this is a simplified process. In theory, it would be better if the lock could be added according to the key value. That is, thread a’s fetching key1 data from the database does not prevent thread B’s fetching key2 data. The above code obviously can’t do this. scheme: lock can be fine-grained to key.

4、 Cache avalanche

definition: cache avalanche refers to the phenomenon that a large amount of data in the cache is due to the expiration time, and the amount of query data is huge, which leads to too much pressure on the database and even down the machine.

different from “cache breakdown”: cache breakdown refers to the concurrent query of the same data; cache avalanche refers to the fact that different data have basically expired at the same time, and many data cannot be found in the cache, so they turn to query the database.

solutions:

1) When saving data to redis in batches, the failure time of each key is set to a random value, so as to ensure that the data will not fail in a large area at the same time.

setRedis(Key,value,time + Math.random () * 10000);

2) If redis is a cluster deployment, the hotspot data can be evenly distributed in different redis databases to avoid the problem of all failure.

3) Hotspot data settings will never expire. If there is an update operation, the cache can be updated.

Read More: