DSA: Implement an LRU Cache (Real Interview Pattern)

7 min read6.1k

Solution to Last Week's Challenge

Last week's problem was: DSA: Interview Prep Strategy for 2024

The optimal solution uses dsa: interview prep strategy for 2024 pattern with careful state management.

Complexity Analysis:

  • Time Complexity: O(n)
  • Space Complexity: O(1)

The Least Recently Used (LRU) cache is perhaps the most iconic data structure design question in the software engineering interview circuit. It sits at the perfect intersection of low-level pointer manipulation and high-level system design. While many candidates can describe the concept—discarding the oldest items first when the cache reaches capacity—implementing it under the pressure of a 45-minute interview requires a deep understanding of how different data structures can be composed to achieve optimal performance.

In real-world production systems, the LRU pattern is everywhere. It is the backbone of the MySQL Buffer Pool, the logic behind Redis eviction policies, and the mechanism operating systems use for page replacement. As a senior engineer, I look for candidates who don't just know the "what," but understand the "why" behind the specific combination of a Hash Map and a Doubly Linked List. This combination is the only way to achieve the industry-standard $O(1)$ time complexity for both access and updates.

Theory and Core Concepts

To implement an LRU cache, we must satisfy two primary requirements: fast lookups and fast updates. A simple array or list allows for $O(1)$ additions at the end, but finding an element or removing one from the middle takes $O(n)$ time. Conversely, a standard Hash Map provides $O(1)$ lookups but lacks any inherent sense of order, making it impossible to track which item was "least recently used."

The "aha!" moment comes when we combine them. We use a Hash Map to store keys and their corresponding locations in a Doubly Linked List. The Doubly Linked List maintains the chronological order of access.

Pattern Visualization

When a get or put operation occurs, the cache must treat that specific key as the "most recently used." In our Doubly Linked List, this means moving the corresponding node to the "head" of the list. If the cache exceeds its capacity during a put operation, we simply remove the node at the "tail," as it represents the least recently used item.

Implementation

The following Python implementation uses a dummy head and tail node. This is a common "pro-tip" in interviews to avoid constant null-pointer checks, making the code cleaner and less error-prone.

python
class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None

class LRUCache:
    def __init__(self, capacity: int):
        self.cap = capacity
        self.cache = {}  # Map key to Node
        
        # Dummy nodes for boundary safety
        self.head = Node(0, 0)
        self.tail = Node(0, 0)
        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        """Remove an existing node from the linked list."""
        prev_node = node.prev
        next_node = node.next
        prev_node.next = next_node
        next_node.prev = prev_node

    def _add(self, node):
        """Add a new node right after the head (most recent)."""
        temp = self.head.next
        self.head.next = node
        node.prev = self.head
        node.next = temp
        temp.prev = node

    def get(self, key: int) -> int:
        # Time Complexity: O(1)
        if key in self.cache:
            node = self.cache[key]
            self._remove(node)
            self._add(node)
            return node.value
        return -1

    def put(self, key: int, value: int) -> None:
        # Time Complexity: O(1)
        if key in self.cache:
            self._remove(self.cache[key])
        
        new_node = Node(key, value)
        self._add(new_node)
        self.cache[key] = new_node
        
        if len(self.cache) > self.cap:
            # Evict the least recently used (node before tail)
            lru = self.tail.prev
            self._remove(lru)
            del self.cache[lru.key]

# Space Complexity: O(Capacity)

Complexity Analysis Table

ApproachGet ComplexityPut ComplexitySpace ComplexityPros/Cons
Array/List$O(n)$$O(1)$$O(n)$Easy to implement; slow lookups.
Hash Map only$O(1)$$O(1)$$O(n)$No order tracking; cannot evict LRU.
DLL + Hash Map$O(1)$$O(1)$$O(n)$Optimal performance; complex pointers.
OrderedDict (Python)$O(1)$$O(1)$$O(n)$Built-in; often disallowed in interviews.

Common Patterns and Variations

Interviewers often add twists to the standard LRU. One common variation is the LFU (Least Frequently Used) cache, which requires tracking access counts. Another is adding a Time-to-Live (TTL), where items expire after a certain duration regardless of access.

Variation: LRU with Update Logic Sometimes you need to update an existing key's value without changing its "recency" if the value is the same. However, the standard pattern is to always move to the head on any put or get.

Variation: Concurrency In a real-world multi-threaded environment (like a Java web server), you would need to wrap these operations in a ReentrantLock or use a ConcurrentHashMap combined with synchronized blocks on the DLL nodes to prevent race conditions during pointer updates.

Practice Problems

To master this pattern, progress through these problems in order. The jump from LRU to LFU is significant and often appears in Senior or Staff-level interviews.

This Week's Interview Challenge

Problem: The Time-Aware LRU Cache

Design a data structure that follows the constraints of a Least Recently Used (LRU) cache, but with an added expiration policy. Each key-value pair is inserted with a ttl (time-to-live) in milliseconds.

A key is considered invalid if:

  1. It is the least recently used and the cache is over capacity.
  2. The current time has exceeded the time it was inserted plus its ttl.

Examples:

  • put(1, "A", 1000) at time 0.
  • get(1) at time 500 returns "A".
  • get(1) at time 1500 returns -1 (expired).

Constraints:

  • get and put should still aim for $O(1)$ average time complexity.
  • You may assume a helper function get_current_time() exists.
  • Capacity will be between 1 and 1000.

Hints:

  1. Think about whether you should store the expiration timestamp inside the Node.
  2. Do you need to clean up expired nodes immediately, or can you do it "lazily" during get calls?
  3. How does the capacity eviction interact with the ttl eviction? (Hint: Check expiration first).

Template:

python
class TimeAwareLRU:
    def __init__(self, capacity: int):
        pass

    def get(self, key: int, current_time: int) -> int:
        # Your code here
        pass

    def put(self, key: int, value: int, ttl: int, current_time: int) -> None:
        # Your code here
        pass

Submit your solution and check back next week for the detailed answer!

Conclusion

The LRU cache is a quintessential DSA problem because it forces you to manage state across two different data structures simultaneously. When practicing, focus on the pointer logic of the Doubly Linked List—drawing it out on paper before coding is a strategy used by even the most seasoned engineers. Mastering this pattern demonstrates to interviewers that you understand the trade-offs of memory locality, time complexity, and robust object-oriented design.

References: