Modularization - Decoupling payments from monolith
Monolith does it all!
Justworks’ journey started back in 2012 with the vision to build modern HR products and services for the small businesses. As was with many SaaS startups, the approach to bootstrap a monolith using Ruby on Rails framework worked fantastically. Justworks was able to convert customer needs into product features at a great speed. This helped Justworks become a go-to partner for small businesses across all 50 US states. But after 10 years, we were standing at the junction where our monolith was doing it all. It has grown to support numerous product lines, to name a few: people management tools, benefits, payments, billing, payroll & taxes. If we had to build a new product feature which needed any additional capabilities like payments, it was inconceivable to build it outside of the monolith. But building additional capabilities into the monolith was becoming increasingly challenging, with ever increasing lead times to deliver, and a fear of breaking something.
Modularization.
Justworks product and engineering leadership realized that in order to launch numerous products, maximize new business velocity, and improve customer satisfaction and retention, we needed a revolution. The core theme of this revolution was to transform the monolith on two guiding principles - modularization and decentralization. And it had to be done with minimal viable integration points that supported both autonomy and speed. This vision led to the initiative to decouple payments functionality out from the monolith, and carve it into a service which can handle all things related to money-movement.
This post describes the approach we took to build a API-first multi-rails (ACH, Same-Day ACH, Wire, RTP), multi-processor payments service and re-engineer the monolith to integrate with it without impacting millions of dollars of payroll and payment transactions which we perform for our customers every day.
Building the bridge from two ends.
The payments team had their task cut out for them. We had to build the bridge from both ends, on one end was the payments service, and on another end we had to modify the monolith to integrate with the payments service. To achieve this, the team was broken into two task forces, each responsible to build their side of the bridge, with API layer as the contract. The API contract approach was followed in principle and in spirit. The two task force teams were discouraged from looking into each other’s code base, but would do regular engagements on functionality and specifications of the APIs. In the next sections we dive deeper into these two streams of work.
One side of the bridge: Carving payments functionality out of the monolith
To decouple the payments functionality out of the monolith, we had to do deep research in understanding the domain models which were specific to payments and how they interact with domain models from the remaining part of the system. In addition to this, we had to understand each and every aspect of business and operational flow around the payment transaction lifecycle - from initiation to completion, to exception handling and reconciliation. After completing this exercise, the team did multiple whiteboarding sessions and landed on a layered architecture that followed a bounded context pattern from domain driven design as shown in the diagram below.
- The architecture has a multi-layered middleware component.
- The MW_1 layer is the shared context with models from non-payments domains.
- The MW_2 layer is responsible for the payments specific domain models and business logic.
- And MW_3 layer is a lightweight stateless HTTP client which is the integration point to the payments service.
Other side of the bridge: Building a multi-rail payments service
As mentioned earlier, the task force working on building the payments service followed an API-first approach. In addition, it was also built with multi-tenancy and externalizability in mind, so that it can handle money-movement needs of external as well as internal customers. To achieve this, the team created an independent Ruby on Rails application with its own Github Repository, Database, and CI/CD pipeline. Fortunately, at the same time, the Infrastructure and Platform teams were working on a cloud-native infrastructure-as-code solution to solve CI/CD and DevOps needs of the new services being built across the organization. This aligned perfectly with our needs and project plans. The DevOps stack provided container orchestration, dynamic scaling, logging, distributed tracing via Datadog, alerting via Rollbar and PagerDuty off the shelf. With API contracts and Infrastructure/DevOps needs in place, the team focused on building the core payments capabilities to support payment rails like ACH, Same-Day ACH, Wire, RTP, transaction returns handling, reconciliations and integration with various processors (ODFIs).
Testing strategies employed:
- Testing both sides independently using API contract and mocking.
- Reconciliation (manual as well as automated) of the payment files.
- End to end testing in the sandbox environment.
- Performance and load testing of the Payment Service.
- UAT in partnership with internal stakeholders.
Dual mode and the final cutover.
As Payment Service started adding core capabilities for each payment rail, the monolith side gradually, via feature flags, started sending money-movement instructions over the API. To minimize the business risk and control the blast radius, the setup was run in dual mode, as shown below.
Dual mode allowed us to execute the rollout in a risk-mitigated manner:
- No-Op mode: Execute the payments via payments service in no-op fashion (not posting them to ODFI/payments-network)
- Perform reconciliation in the no-op mode.
- Alpha launch: Execute a small subset of simple payment transactions via new flow.
- Beta launch: Execute more complex transactions and exceptions flows.
- Prepare for GA launch
Given the mission critical nature of the payments domain, the team also worked in close collaboration with Payment Operations, Accounting and other stakeholders. As we built confidence in the new system we started processing more customer payments via the Payment Service and were eventually able to do a final cutover to process 100% payment processing through it and decommission payment processing functionality from the monolith.
Key learnings
- Understanding the domain and business flows is of utmost importance. If there is an ambiguity in that, there will be ambiguity in the solution implemented.
- Respecting the API Contract helps us stay disciplined and avoid being influenced by the internal implementation of the service being integrated with.
- For a re-engineering project, to get buy-in from the stakeholders (e.g. Operations, Treasury) it is important to deliver functional parity required to run the business.
- Hence, it is important to be able to run the system being re-engineered, in dual mode and perform exhaustive regression testing.
- Application instrumentation and data analytics are a must to have visibility into the system and data behavior and proactively catch any gaps.
Conclusion
Initiatives as ambitious as this can initially look daunting. But with the help of deep learning, research, and collaboration, they can be broken down into manageable steps. There will be hurdles and roadblocks but staying positive and focused helps. Today, our Payment Service is processing billions of dollars of transactions every month. New product lines are integrating with it to solve their money-movement problems and opening up new use cases (e.g. virtual wallets, smarter and faster payment strategies) that we could not previously support. This experience has made us confident to tackle new challenges and decouple other sets of functionalities out of the monolith.