The most commonly used algorithm is the LRU algorithm that makes a simple assumption for all the data accesses: if a data block is accessed once, it will be accessed again. This locality metric is also called ``recency'', which is implemented by a LRU stack. Each data access is recorded in the LRU stack, where the top entry is the most recent accessed (MRU) and the bottom entry is the least recent accessed (LRU). The LRU entry is the evicted item as the buffer is full (reflected by the full LRU stack). If data access pattern follows the simple assumption of LRU, the strong locality data set is well kept in the buffer by LRU. However, the LRU algorithm fails to handle the following three data access patterns: (1) each data block is only accessed once in a format of sequential scans. In this situation, the buffer would be massively polluted by weak or no locality data blocks. (2) For a cyclic (loop-like) data access pattern, where the loop length is slightly larger than the buffer size, LRU always mistakenly evicts the blocks that will be accessed soon in the next loop. (3) In multiple streams of data accesses where each stream has its own probability for data re-accesses, LRU could not distinguish the probabilities among the streams.
Although the LRU algorithm has these three critical flaws, it has been widely used in production systems due to its merits of low complexity and low overhead implementation. Since the LRU replacement algorithm was proposed and its approximation was implemented in 1960`s, computer scientists and system engineers have made continuous and tireless efforts to improve replacement algorithms analytically and systematically for a common goal of addressing the flaws of LRU, subject to keeping the merits of LRU (working well for strong locality data accesses and low overhead implementation).
The LIRS Caching Solution
LIRS represents Low Inter-reference Recency Set that is the data set to be
kept in the buffer cache. Inter-reference recency is equivalent to
reuse-distance for a accessing a data block, which measures
the number of distinct data entry accesses between two consecutive accesses
of the data block. The data replacement decision of the LIRS algorithm
is made based on the reuse distance, namely to evict the high reuse distance
data blocks and keep the low reuse distance data blocks. In this way,
the three LRU issues are addressed. The LIRS caching algorithm is
implemented by a data structure of two stacks, retaining the
same complexity as that of LRU.
The Usage of the LIRS Caching Algorithm in Software Systems
After many years` efforts from software industries since the LIRS paper was
published in 2002, the dominant LRU-like algorithms have been gradually
replaced by LIRS-like algorithms in major operating systems, database systems,
and data centers, including BSD, Linux, MySQL, Infinispan, and other production systems.