Friday, August 28, 2015

Service continuity using Zookeeper

In service oriented architecture where one service is dependent on others, the challenges one would face is how to upgrade one service without coming to a grinding halt.

Consider two servers (essentially both are services providing a broader service) A and B, with A depending on B. Lets assume for simplicity service B exposes REST api's to be consumed by service A.

Old way: A and B running on the same physical machine. This setup is replicated on number of machines.

Below are some of the concerns.
  1. How to guarantee service continuity during software upgrade? Lets say B needs to be upgraded, should be halt service provided A? This maybe acceptable if A's service is not that critical and a downtime is acceptable.
  2. How to avoid single point of failure? Lets say B stops responding, what should A do? Retry later? What should be the interval of retry?
Why it didnt work for us?
We had a chain of services where customer downtime was not acceptable. The replication on different machines handled the downtime part in a hacky fashion where customer facing service was load balanced but didn't address the single point of failure issue. Also this was not efficient utilization of resource. Consider a case where we need to scale up, in that case, we would have to add the entire stack.

New solution: Store B server state information in Zookeeper and let A discover B through Zookeeper. Curator's service discovery framework has a nice library which can be used for the same. Store the hostname, ip address, port (state information) when starting the service and remove this information during service stop.

Next steps: Will see if I can hack together a bare bones code and post a link here.


No comments:

Post a Comment