LIRS Caching Algorithm

The LIRS Caching Algorithm has been Widely Adopted in Computer and Data Processing Systems

Background

The most commonly used algorithm is the LRU algorithm that makes a simple assumption for all the data accesses: if a data block is accessed once, it will be accessed again. This locality metric is also called ``recency'', which is implemented by a LRU stack. Each data access is recorded in the LRU stack, where the top entry is the most recent accessed (MRU) and the bottom entry is the least recent accessed (LRU). The LRU entry is the evicted item as the buffer is full (reflected by the full LRU stack). If data access pattern follows the simple assumption of LRU, the strong locality data set is well kept in the buffer by LRU. However, the LRU algorithm fails to handle the following three data access patterns: (1) each data block is only accessed once in a format of sequential scans. In this situation, the buffer would be massively polluted by weak or no locality data blocks. (2) For a cyclic (loop-like) data access pattern, where the loop length is slightly larger than the buffer size, LRU always mistakenly evicts the blocks that will be accessed soon in the next loop. (3) In multiple streams of data accesses where each stream has its own probability for data re-accesses, LRU could not distinguish the probabilities among the streams.

Although the LRU algorithm has these three critical flaws, it has been widely used in production systems due to its merits of low complexity and low overhead implementation. Since the LRU replacement algorithm was proposed and its approximation was implemented in 1960`s, computer scientists and system engineers have made continuous and tireless efforts to improve replacement algorithms analytically and systematically for a common goal of addressing the flaws of LRU, subject to keeping the merits of LRU (working well for strong locality data accesses and low overhead implementation).

The LIRS Caching Solution
LIRS represents Low Inter-reference Recency Set that is the data set to be kept in the buffer cache. Inter-reference recency is equivalent to reuse-distance for a accessing a data block, which measures the number of distinct data entry accesses between two consecutive accesses of the data block. The data replacement decision of the LIRS algorithm is made based on the reuse distance, namely to evict the high reuse distance data blocks and keep the low reuse distance data blocks. In this way, the three LRU issues are addressed. The LIRS caching algorithm is implemented by a data structure of two stacks, retaining the same complexity as that of LRU.

The Usage of the LIRS Caching Algorithm in Software Systems
After many years` efforts from software industries since the LIRS paper was published in 2002, the dominant LRU-like algorithms have been gradually replaced by LIRS-like algorithms in major operating systems, database systems, and data centers, including BSD, Linux, MySQL, Infinispan, and other production systems.

Song Jiang and Xiaodong Zhang, ``LIRS: an efficient low inter-reference recency set replacement to improve buffer cache performance", Proceedings of the 2002 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'02), Marina Del Rey, California, June 15-19, 2002.
The extended version has been published in IEEE Transactions on Computers, Vol. 54, No. 8, 2005.
- The LIRS Wikipedia page.
- The LIRS algorithm has been adopted in MySQL, the world's most widely used database system.
- The LIRS source code is archived in page management section of MySQL Server 5.6.
- A widely adopted algorithm for many data processing applications.
- Towards an O(1) VM.
- The LIRS caching algorithm has been adopted in Infinispan data grid platform.
- A JBoss Community article about LIRS in Infinispan.
- LIRS with free lock contention in Infinispan is provided by BP-Wrapper (ICDE'09).
- Keeping Infinispan in shape: highly-precise, scalable data eviction, a presentation of LIRS and BP-Wrapper based caching for Infinispan from Red Hat.
- The LIRS algorithm has been merged in GridGain, an in-memory data processing system. (Technical Brief of the In-Memory Data Grid).
- The LIRS algorithm is used in Apache Impala, a low latency and high concurrency data processing system on Hadoop. (Cloudera Software Release for Apache Impala).
- The LIRS algorithm has been adopted by Apache Jackrabbit, a hierarchical content repository. (Jackrabbit Oak LIRS Cache).
- The LIRS algorithm has been adopted by H2 Database Engine, a Java SQL database. (A Scan Resistant Cache).
- The LIRS algorithm has been adopted by Red Hat JBoss Data Grid, an in-memory data processing system. (Administration and Configuration Guide ).
- The LIRS algorithm has been adopted by Red Hat Data Virtualization System, a data integration solution to manage multiple data sources. ( An in-memory LIRS cache of distributed nodes is used in the hierarchical database ).