Peer-to-peer choreography and orchestration

Photo by Matt Hardy on Unsplash

Peer-to-peer choreography

As explained, in the peer-to-peer choreography strategy all microservices will know exactly what to do, and when to do it. The microservices operate individually without being aware of the bigger system and without any centralized authority telling it what to do. Each microservices listens to the events it is interested in and performs its operation based on it. Due to this publish and subscribe nature, it is very easy to extend the system with more microservices that listen to the same events.

  • The Loan request service which is responsible for handling user input (user input validation etc.)
  • The Offer service which determines whether we can or cannot offer the requested loan by applying risk factors and performing additional business rules
Figure 1. The basic loan request flow
Figure 2. Extending our system by adding the Email service
Figure 3. Changing the flow of the system by adding a credit service
  • If no existing microservice needs the output of our new microservice: no changes to microservices other than the new one is needed. For example, think of either extending an existing sequential flow with another step at the end (our Email service) or introducing a new parallel flow based on an existing event
  • If any existing microservice needs the output of our new microservice: at least 2 microservices need to be updated (the new one and the existing one(s) that need the output). An example of this was adding our Credit service
  • Solution 1: Update the Offer service and the LoanRequestProcessed event to contain extra information on whether the authorities should be informed or not and update the logic in the Email service
  • Solution 2: Have the Email service listen to the CriminalActivityChecked event (output of the new microservice) and act independently on this
Figure 4. Adding a criminal activity service to the flow, which in some cases leads to an extra email being sent. Note that 4a, 5a and 6a form the main flow, and 4b and 5b form a parallel flow
  • It’s easy to extend the system with new functionality without having to adjust any other microservices
  • The autonomy of microservices results in high overall availability
  • There is no single point of failure
  • Understanding everything that happens in the system becomes hard over time

Orchestration

As opposed to the decentralized nature of the peer-to-peer choreography strategy, the orchestration strategy relies on a single, centralized microservice that orchestrates the workflow. The sole responsibility of this orchestrator is to listen to events that happen in the system and send out commands to other microservices to start performing their task.

Figure 5. The initial loan request flow with an orchestrator
Figure 6. Adding the Email service to our orchestrated flow
Figure 7. The credit service being orchestrator
Figure 8. Our final, orchestrated loan request flow

Complex scenarios

So far in this blog we have covered how to build and evolve event flows, and some pros and cons of both strategies. However, as systems become bigger and more mature, we always tend to get to a point where we hit more complex scenarios like:

  • Implementing non-happy flows where a technical error happens halfway through the flow
  • Being able to automatically act on incomplete processes
  • The need for SLAs and metrics to measure the system’s performance and health
  • Building and maintaining a custom orchestrator or leveraging a workflow engine

Non-happy flow

If we take a look at the final design of our loan request system, there are a lot of things that can go wrong. Let’s take a non-happy flow to explore further — what happens if an email fails to send due to an unexpected error, or even worse due to a network issue the event never arrives in the Email service.

Acting on incomplete processes

Just like unhappy flows due to errors in the system, acting on incomplete processes is easier to do with the orchestration implementation. Let’s say that after we approved the loan request and we’ve sent the email to our user, we expect our user to acknowledge the offer. Since the orchestrator knows exactly when the email was sent, we can send time-based reminders to the user if (s)he did not approve the offer after x days. The same holds for a process that gets stuck halfway through due to a manual intervention that is needed, for example if we want one of our employees to double check randomly selected credit checks on false positives. The logic works similar to the earlier discussed timeout scenario. If after x period a manual intervention did not take place, send a reminder to make sure the process gets completed. In peer-to-peer choreography this becomes harder because the information about the full flow is distributed, so a microservice is not aware that a flow is incomplete.

SLAs and metrics

As the system matures, either you as the developer, or the business, will want to have some metrics and numbers on the performance of the system. This can of course have multiple reasons like defining an SLA, understanding the bottlenecks in the system, being able to see whether all microservices are healthy etc. In the peer-to-peer choreography implementation, the only way to get this information is to have every microservice export its metrics to a centralized monitoring system. In the case of our loan request system, from the Loan request service all the way to the Email service, all microservices need to export when they started processing an event and when they finished. By aggregating this information, we can than create a dashboard that gives us the insights we need. Although this solution also works for the orchestration implementation, it can be done easier. The orchestrator is already aware of everything that has happened, and when it all happened. In other words, all the metrics that need to be exported and centralized in the peer-to-peer choreography implementation are already present in the orchestrator itself. This means that all we really need to do is to visualize this (or export this information from the orchestrator to the monitoring application to save us some visualization time).

Custom orchestration and workflow engines

When implementing the orchestration strategy, especially for a simple flow as the loan request system we’ve just designed, the orchestrator can be implemented by simply building another microservice that coordinates the rest. However, as your system grows and the flows become more complex, it can be worth it to start looking into open-source workflow engines like for example Netflix’s Conductor, Apache Airflow, Argo and Camunda (of course the list is longer than this) or even workflow engines provided by cloud vendors like AWS Step Functions, Azure Logic Apps and Google Cloud Workflows.

In conclusion

In this blog we dove deeper into the two main strategies when designing event flows, peer-to-peer choreography and orchestration, where they excel and what their drawbacks are and what considerations to take into account when choosing the strategy that works for you. It is important to know that both are tools to keep in your pocket to use at the appropriate times. We didn’t go into it explicitly, but they can even be combined in some cases like having an Audit service listening to all events in a peer-to-peer choreography style where the rest of the system follows the orchestration strategy. Just be careful when doing this, as mixing strategies can make things more complex.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Xavyr Rademaker

Xavyr Rademaker

80 Followers

Software engineer at Deloitte with an interest in subjects like, but not limited to, event-driven microservices, Kubernetes, AWS, and the list goes on.