These are the techniques, how objects are stored and retrieved from a cache.
- Read Through
- Write Through
- Read Ahead
Last week I wrote about Cache-Aside pattern and provided a code sample of a minimal implementation of Cache-Aside pattern to get started with Redis. (intentionally tested using Redis on Azure). The code sample also has a provider class and a practical implementation of the pattern which can directly be used in MVC / Web API projects.
In this post let’s discuss more into the conceptual engineering aspect of the caching strategies. I address the scenarios and also how to implement them using a language. The steps are explained in English so the real implementation can be done using any programing language.
This is a very simple and a straight forward approach and very common in use. Application reads the data from the cache. If the data is available in the cache application will get it, else application reads the data from the data store and store it in the cache for the future references.
Objects in the cache are stored for a specific time, any request that comes to the cache within this time frame will be served from the cache. If a write happens to the object within that time frame and
- If that write operation invalidates the cache, then next immediate read after the write will hit the data store and update the cache.
- If that write does not invalidate the cache, then next immediate read after the write will get stale data.
The code sample of Cache-Aside pattern explains the scenario 1 under the Read Through. This ensures that application does not get stale data. But at the same time this might bring performance issues where write rate is equal or greater than the read of the object.
- Application reads the data from the cache
- If data is in the cache application gets it, else it loads from the data store and updates the cache.
- Application writes data to the data store.
- Successful write operation would invalidate the corresponding object in the cache.
Applications write the data to the cache, not to the data store. The caching service will write the data transparently to the data store. Mostly this update is synchronous, so a typical write operation returns a success when the data is written to the cache and to the data store. Since data is written to the cache, no need to invalidate the object. Modifications to the object in the cache need to be handled in a thread safe way. Applications get the latest data.
- Application write the data to the cache.
- Caching service or the application writes the data to the data store
- A write is considered successful if both the cache and the data store are updated.
- We can use two different threads one to update the cache and the other to update the data store and wait for both to complete successfully.
- Using application generated IDs for the objects would help.
- Updating the objects should be thread safe
Write through also has a delayed update to the data store. This is known as a Write-Behind strategy.
- Application writes the data to the cache
- A write is considered success if the write to the cache is success.
- Later stage (either periodically or based on eviction time or based on any specific parameter) data store will be updated.
This is a very helpful and a high responsive design. Most of the modern applications which has high throughput follow this strategy. Your cache should be reliable and should support at least one stage of master-slave model in order to be reliable. Because if the cache goes down before the write takes place to the data store, then there’s no way to get the data.
Also if any object requires all auditing trails, then this strategy cannot be useful. Example – An application requires all operations on the Products should be logged. A new product is added and then modified. Data store update happens after the modification, so in this case we totally miss the old value and the change log of that product.
Read the frequent access data from the data store, before the cache object get evicted. For example, there’s a products collection in the cache. This is accessed very often and the eviction time is 120 seconds. So this collection will be removed from the cache after 120 seconds under normal cache implementations.
So the first read after the object has been cleared from the cache go through the Read Through strategy. So that read might take longer time. Read Ahead strategy refreshes the collection before it get evicted. The refresh happens automatically. In the Read Through this refresh happens on demand.
- There should be a mechanism to observe the cache object life times. (Redis has an implementation that it triggers an event for this)
- Based on the event, we fire up a worker to load the data to the cache, even before the application requests the data.
Caching is a strategical decision. We can simply use it just to store some objects and also an entire application can be designed and scaled based on the caching as well.