Monday, March 26, 2012

Implementing Master Slave / Grid Computing Pattern in Akka

Master Slave pattern is a prime example of fault tolerance and parallel computation. The idea behind the pattern is to partition the work into identical sub tasks which are then delegated to Slaves. These slave node or instances will process the work task and send back the result to the master. The master will then compile the results received from all the slave nodes. Key here is that the Slave nodes are only aware on how to process the task and not aware of what happens to the output.

The Master Slave pattern is analogous to the Grid Computing pattern where a control node distributes the work to other nodes. Idea is to make use of the nodes on the network for their computing power. SETI@Home was one of the earliest pioneers in using this model.

I have build a similar example with difference being that worker nodes get started on Remote Nodes, Worker Nodes register with Master(WorkServer) and then subsequently start processing work packets. If there is no worker slave registered with Master(WorkServer), the master waits the workers to register. The workers can register at any time and will start getting work packets from there on.




The example demonstrates how an WorkerActor system sends a request for registration. The RegisterRemoteWorker recieves the request and forwards the same to JobController where the RoundRobinRouter is updated for the new worker information. The WorkScheduler sends a periodic request to JobController, who then sends packets to all the registered worker actors.

The example does not implement fault tolerance with respect to on how to handle failures when the remote actors die or how to re-process packets that have not been processed. Similarly, there may be cases where the remote worker actors might want to shutdown after processing certain amount of packets, they can then indicate to the master to stop giving them work. I will add fault tolerance soon!

Updated: Code base updated to handle worker shutdowns. If the remote actors die or shut down, the JobController detects the fail-overs using remote actor listeners and updates the router.

The code base for the program is available at the following location - https://github.com/write2munish/Akka-Essentials under the GridPatternExample

Trackback

loading..