Wednesday, March 14, 2012

Word Count MapReduce with Akka

In my ongoing workings with Akka, i recently wrote an Word count map reduce example. This example implements the Map Reduce model, which is very good fit for a scale out design approach.


  1. The client system (FileReadActor) reads a text file and sends each line of text as a message to the ClientActor. 
  2. The ClientActor has the reference to the RemoteActor ( WCMapReduceActor ) and the message is passed on to the remote actor
  3. The server (WCMapReduceActor) gets the message. The Actor uses the PriorityMailBox to decide the priority of the message and filters the queue accordingly. In this case, the PriorityMailBox is used to segregate the message between the mapreduce requests and getting the list of results (DISPLAY_LIST)message from the aggregate actor. 
  4. The  WCMapReduceActor sends across the messages to the MapActor (uses RoundRobinRouter dispatcher) for mapping the words
  5.  After mapping the words, the message is send across to the ReduceActor(uses RoundRobinRouter dispatcher) for reducing the words  
  6. The reduced result(s) are send to the Aggregate Actor that does an in-memory aggregation of the result 

The following picture details how the program has been structured

The code base for the program is available at the following location -

For more information on MapReduce, read the post MapReduce for dummies


  1. This MapReduce is completely flawed. There is no guarantee that after receiving the message DISPLAY_LIST, all the MapActor instances are done processing e.g. if you have a lengthy running MapActor your DISPLAY_LIST will go through and some values will be left unprocessed.

    If you want to break this MapReduce implementation simply add this snippet to the MapActor at line 95:

    if ("Thieves! thieves!".equals(work)) {
    try { System.out.println("*** sleeping!"); Thread.sleep(10000); System.out.println("*** back!");
    catch (InterruptedException e) { e.printStackTrace();

    meaning if you make a MapActor wait long enough, the overall job is incomplete and the MapReduce solution falls apart.

    1. Idea is to provide a current snapshot of the values. There are other ways to get the results of the overall jobs - one is you can wait if you know how much time it takes to process the message, other is to make use of number of lines send count (no of messages) and use that to verify how many messages processed to verify the completeness of the job.

      The intention of this code piece is show map reduce working using actors.