Thursday, July 7, 2011

Trie Data Structure in Java.

What are Tries?
A.K.A Prefix trees, is a multiway structure for storing (note can be used to store any item of significance) over alphabets.

Where would Tries be used?
Dictionary
Digital Address book.
Type aheads (at my company, all of them are implemented using Trie)

General Concept:
The basic idea is words sharing common prefix can share storage space till the common prefix and then separate. This not only saves space but also makes searching linear. The key thing to understand here is depending on the needs, one can use this idea to save an "item" at the end of each word (this makes the item to be retrieved in linear time).

Example:
A concrete example would be to retrieve the person contacts in Digital address book. Person name is used to save the contents (emails, address, image thumbnail etc) at the end. Notice searching on a person name is simply look up on the length of name.

Java Implementation:
We begin with Two classes namely Trie and TrieNode.

Trie.java: This class stores a TrieNode root, which is the topmost element of the tree. It exposes operations like insert, getCommonPrefixMatch, search, etc.

Following snippet of Trie.java:

import java.util.Map;
import java.util.Set;
import java.util.ArrayList;

import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.commons.lang.StringUtils;


public class Trie {
private TrieNode root;

public Trie() {
root = new TrieNode();
}

public void insert(String s) {
TrieNode current = root;
if (StringUtils.isEmpty(s)) {
current.setWord(true); //for blank operator.
}
s = normailizeString(s);
for (int i= 0; i<s.length();i++) {
char c = s.charAt(i);
TrieNode childNode = current.getChildNode(c);
if (childNode == null) {
childNode = new TrieNode(c);
current.getChildNodes().put(c, childNode);
}
//update current node.
current = childNode;
}
//update current node isWord to true.
current.setWord(true);
System.out.println("Inserted String:"+s+" into Trie");
}

public void search(String s) {
TrieNode current = root;
if(current != null && StringUtils.isNotEmpty(s)) {
s = normailizeString(s);
for (int i=0;i<s.length();i++) {
char c = s.charAt(i);
TrieNode childNode = current.getChildNode(c);
if (childNode != null) {
current = childNode; // increment child node.
} else {
System.out.println("String:"+s+" not found.");
return;
}
}
if (current.isWord())
System.out.println("String:"+s+" found.");
}
}

public void delete(String s, TrieNode node) {
if (StringUtils.isNotEmpty(s)) {
s = normailizeString(s);
if (node != null) {
char c = s.charAt(0);
TrieNode childNode = node.getChildNode(c);
delete(s.substring(1),childNode);
Map<Character, TrieNode> childNodes = node.getChildNodes();
if (childNodes == null || childNodes.keySet().size() == 0) {
//delete current node only if there is no child nodes.
node= null; //nulling the reference.
}
System.out.println("Deleted"+c);
}
}
}

public void printWordsMatchingPrefix(String prefix, TrieNode current, String entirePrefix) {
if (StringUtils.isNotEmpty(prefix)) {
prefix = normailizeString(prefix);
char c = prefix.charAt(0);
TrieNode child = current.getChildNode(c);
if (child != null) {
String word = entirePrefix+child.getContent();
if (child.isWord())
System.out.println("Found word:"+word);
printWordsMatchingPrefix(prefix.substring(prefix.indexOf(c)+1), child, word);
}
} else {
//denotes reaching end of prefix, begin traversing to get matching words.
Map<Character, TrieNode> map = current.getChildNodes();
if (map != null) {
for (char c: map.keySet()) {
TrieNode child = current.getChildNode(c);
if (child != null) {
String word = entirePrefix+child.getContent();
if (child.isWord())
System.out.println("Found word:"+word);
printWordsMatchingPrefix("", child, word);
}
}
}
}
}


private String normailizeString(String s){
return s.toLowerCase();
}

public static void main(String[] args) {
Trie trie = new Trie();
trie.insert("Amrut");
trie.insert("All");
trie.insert("Ant");
trie.insert("Budihal");
trie.insert("Buddy");
trie.insert("abudihal");
trie.printWordsMatchingPrefix("b", trie.getRoot(), "");
trie.search("Amrut");
}

public TrieNode getRoot() {
return root;
}
}



import java.util.Map;
import java.util.HashMap;
import org.apache.commons.lang.builder.HashCodeBuilder;

public class TrieNode {
private boolean isWord;
private char content;
private Map<Character,TrieNode> childNodes;

public boolean isWord() {
return isWord;
}

public void setWord(boolean isWord) {
this.isWord = isWord;
}

public char getContent() {
return content;
}

public void setContent(char content) {
this.content = content;
}

public Map<Character, TrieNode> getChildNodes() {
return childNodes;
}

public void setChildNodes(Map<Character, TrieNode> childNodes) {
this.childNodes = childNodes;
}

public TrieNode(char _content) {
content = _content;
isWord = false;
childNodes = new HashMap<Character,TrieNode>();
}

public TrieNode() {
isWord = false;
content = '$';
childNodes = new HashMap<Character,TrieNode>();
}

public TrieNode getChildNode(char c) {
if (childNodes != null) {
return childNodes.get(c);
}
return null;
}

public boolean equals(Object o) {
if (o instanceof TrieNode) {
TrieNode node = (TrieNode)o;
return (content == node.getContent());
}
return false;
}

public int hashCode() {
return HashCodeBuilder.reflectionHashCode(this);
}
}

Planned Future work: Make trie efficient using caching mechanism. --> benefits: less memory consumption.

2 comments:

  1. org.apache.commons.lang.builder.HashCodeBuilder;

    hot to compile

    ReplyDelete
  2. Is there any library or documentation/link which gives more information of implementing Trie data structure in java?Any help would be great!Thanks.
    make linear barcode in java

    ReplyDelete