Overview

The goal of this project was to build an auto-complete command line tool. This tool took as input both a string and a possible word list. It would then search the word list for words that either start (prefix-search) or end (suffix-search) with the string.

To complete this we used a trie, because it allows us to search in time where m is the length of the input string. Not only do tries support quick searching but they also allow for quick insertions and deletions.

When working with strings this makes them a great replacement for hashtables. In this case a trie is better because:

  1. It has a faster worst case lookup
  2. There is no need to come up with or change a hash function
  3. There are no collisions

Building a Trie

The main idea of a trie is each node has two parts:

  1. A list or array of pointers to nodes
  2. A bag of words

A trie is a tree of these nodes. In our example each node can have 26 pointers for the 26 characters in the alphabet (non-alpha characters throw an exception). From the root that stores every word, we can break the words into groupings by prefix simply by following the pointers.

Lets walk through the insertion of the word ‘apple’. In the first node we would follow the ‘a’ pointer to another node. Here we would store ‘apple’ in the bag of words for this node. Then we would move to the ‘p’ node. Again we would store ‘apple’.

Then when searching for the prefix ‘a’, we would follow the ‘a’ pointer, and return the bag of words at ‘a’. In this case it would return ‘apple’.

When searching for the prefix ‘app’ we would follow the ‘a’ pointer, then the ‘p’ pointer, and then the ‘p’ pointer of the next node. Here we would return the bag of words at ‘a’ –> ‘p’ –> ‘p’ which would return ‘apple’.

Once the forward version was built for prefixes we simply had to reverse the words and repeat for the suffix case.

Conclusion

While this was a simple project, it was a good introduction to tries. These are used all around us today in parsers and auto-complete tools we take for granted.

The full code for this project can be found on Github.