Monday, July 11, 2011

LRU Cache using PriorityQueue


Background:
LRU (least recently used) cache is a good way of implementing cache because it frees up precious memory based on the last time an element was used. The criteria here is to purge items based on when the last time a particular element was last accessed.

In this post, I tried to implement a LRU cache using PriorityQueue. My rationale for choosing this combination: PriorityQueue does the sorting internally based on Comparator passed to it and I can concentrate on creating the cache, rather than worrying about maintaining the sort order.

Update: Check another way of implementing lru which is more terse found at link.

Implementation details:
I mainly have two Classes:
1) Element.java: This is the place holder for values in PriorityQueue (PQ) which has timestamp and value attributes. timestamp is to be updated when a element is accessed (put or get operations inside LRUCache)
2) LRUCache.java: This class uses PQ, elements are ordered in this object based on timestamp (the least timestamp would the first to be popped out). There is a cap SIZE of 5 on the cache. HashMap stores the actual value. LRUCache provides two public methods: get and put. Each of which is explained below:
   a) get(String key) : Idea is as soon as the key is retrieved from the map, element key (this key matches to that of HashMap, key) in PriorityQueue should be updated with current timestamp.
   b) put(String key, String value) : This method is responsible for checking if the key entry exist in map (in this case, it is a cache hit and hence update the PQ element with new timestamp). If the key does not exist in map, check if the map size. If greater than the cap set, remove the least recently used entry. If not greater, simply add this key to PQ.
   c) There are a set of private methods (insert, remove and update) which operate on PQ. insert and remove are pretty intuitive, both methods use offer and poll methods provided by PQ.
   d) Update operation is interesting. I knew that underlying data structure for PQ is heap. I was under the impression, that modifying elements (updating the timestamp) in PQ would initiate re-ordering of elements process. My assumption soon proved wrong when I saw the output not matching the required behavior. As it turned out, a heap would update only on inserting or removing operations and hence for my update operation, I had to delete and insert the entry(partially sorted tree) into PQ. Thus update operation is performed in O(n.logn) times (find a element takes linear time and then inserting that element takes logn time).

Following is the code snippet:
package lru;

import java.util.*;
import org.apache.commons.lang.StringUtils;

public class LRUCache {
private static final int SIZE = 5;
private Map<String, String> map = new HashMap<String, String>();

private PriorityQueue<Element> pq = new PriorityQueue<Element>(SIZE, new Comparator(){
 @Override
 public int compare(Object arg0, Object arg1) {
 //Ordering the elements as per timestamp.
 if (! (arg0 instanceof Element) || !(arg1 instanceof Element))
   return 0;
 Element e1 = (Element) arg0;
 Element e2 = (Element) arg1;
 return e1.getTimestamp().compareTo(e2.getTimestamp());
 }
});

private void insert(Element e) {
 System.out.println("Received Element: "+e.getValue());
 pq.offer(e);
}

private String remove() {
 Element leastUsed = pq.poll();
 if (leastUsed != null) {
  System.out.println("Removing least used element:"+leastUsed.getValue());
  System.out.println("This element was last used:"+leastUsed.getTimestamp());
  return leastUsed.getValue();
 }
 return "";
}

private void update(String mostRecentEleKey) {
 //update priority queue with most recent access.
 //Internal datastructure on PriorityQueue is Heap and it is partially sorted.
 //This means, any update on elements means to delete them and add them again.

 Iterator<Element> pqIterator = pq.iterator();
 while(pqIterator.hasNext()) {
 Element e = pqIterator.next();
  if(e.getValue().equals(mostRecentEleKey)) {
   pqIterator.remove();
   break;
  }
 }
 Element mostRecent = new Element();
 mostRecent.setTimestamp(new Date());
 mostRecent.setValue(mostRecentEleKey);
 insert(mostRecent);
}

public String get(String key) {
 String value = map.get(key);
 if (StringUtils.isNotEmpty(value)) {
  System.out.println("Updating "+key+" with current timestamp.");
  update(key);
 }
 return value;
}

public void put(String key, String value) {
 System.out.println("Before put opertaion, map size:"+map.size());
 if (map.containsKey(key)) {
  System.out.println("Cache hit on key:"+key+", nothing to insert!");
  update(key);
 } else {
  if(map.size() >= SIZE) {
   String leastUsedKey = remove();
   map.remove(leastUsedKey);
  }
  System.out.println("Element not present in Cache: "+key);
  Element e = new Element();
  e.setValue(key);
  e.setTimestamp(new Date());
  insert(e);
  map.put(key,value);
 }
 System.out.println("After put operation, following stats are generated:");
 System.out.println("Least used element:"+pq.peek().getValue()+", last used at:"+pq.peek().getTimestamp());
 System.out.println("map size:"+map.size());
}

public static void main(String[] args) {
 LRUCache cache = new LRUCache();
 cache.put("a","a");cache.put("b","b");
 cache.put("c","c");cache.put("d","d");
 cache.put("e","e");cache.put("b","b");
 cache.put("d","d");cache.put("b","b");
 cache.put("c","c");cache.put("b","b");
 cache.put("a","a");cache.put("f","f");
 cache.get("a");
}
}


package lru;

import java.util.Date;
import org.apache.commons.lang.builder.*;

public class Element {
private String value;
private Date timestamp;

public String getValue() {
 return value;
}

public void setValue(String value) {
 this.value = value;
}

public Date getTimestamp() {
 return timestamp;
}

public void setTimestamp(Date timestamp) {
 this.timestamp = timestamp;
}

public boolean equals(Element e) {
 return value.equals(e.getValue());
}

public int hashCode(){
 return new HashCodeBuilder(17,37).append(value).toHashCode();
}
}

This produces output:

Before put opertaion, map size:0
Element not present in Cache: a
Received Element: a
After put operation, following stats are generated:
Least used element:a, last used at: Mon Jul 11 17:03:29 PDT 2011
map size:1
Before put opertaion, map size:1
Element not present in Cache: b
Received Element: b
After put operation, following stats are generated:
Least used element:a, last used at: Mon Jul 11 17:03:29 PDT 2011
map size:2
Before put opertaion, map size:2
Element not present in Cache: c
Received Element: c
After put operation, following stats are generated:
Least used element:a, last used at: Mon Jul 11 17:03:29 PDT 2011
map size:3
Before put opertaion, map size:3
Element not present in Cache: d
Received Element: d
After put operation, following stats are generated:
Least used element:a, last used at: Mon Jul 11 17:03:29 PDT 2011
map size:4
Before put opertaion, map size:4
Element not present in Cache: e
Received Element: e
After put operation, following stats are generated:
Least used element:a, last used at: Mon Jul 11 17:03:29 PDT 2011
map size:5
Before put opertaion, map size:5
Cache hit on key:b, nothing to insert!
Received Element: b
After put operation, following stats are generated:
Least used element:a, last used at: Mon Jul 11 17:03:29 PDT 2011
map size:5
Before put opertaion, map size:5
Cache hit on key:d, nothing to insert!
Received Element: d
After put operation, following stats are generated:
Least used element:a, last used at: Mon Jul 11 17:03:29 PDT 2011
map size:5
Before put opertaion, map size:5
Cache hit on key:b, nothing to insert!
Received Element: b
After put operation, following stats are generated:
Least used element:a, last used at: Mon Jul 11 17:03:29 PDT 2011
map size:5
Before put opertaion, map size:5
Cache hit on key:c, nothing to insert!
Received Element: c
After put operation, following stats are generated:
Least used element:a, last used at: Mon Jul 11 17:03:29 PDT 2011
map size:5
Before put opertaion, map size:5
Cache hit on key:b, nothing to insert!
Received Element: b
After put operation, following stats are generated:
Least used element:a, last used at: Mon Jul 11 17:03:29 PDT 2011
map size:5
Before put opertaion, map size:5
Cache hit on key:a, nothing to insert!
Received Element: a
After put operation, following stats are generated:
Least used element:e, last used at: Mon Jul 11 17:03:29 PDT 2011
map size:5
Before put opertaion, map size:5
Removing least used element:e
This element was last used:Mon Jul 11 17:03:29 PDT 2011
Element not present in Cache: f
Received Element: f
After put operation, following stats are generated:
Least used element:d, last used at: Mon Jul 11 17:03:29 PDT 2011
map size:5
Updating a with current timestamp.
Received Element: a


Further work:
1) Need to make it generic.
2) Handling memory leak.

No comments:

Post a Comment