Architecture

Intro

This section describes the architectural foundations of the SCDOM system — a scalable, modular, and event-driven order management platform based on microservices.

The architecture is designed to support flexibility, independent service evolution, and high availability. By applying principles such as Domain-Driven Design (DDD), Event-Driven Architecture (EDA), CQRS, and Hexagonal Architecture, the system ensures strong boundaries between subdomains, reliable communication through messaging, and clear ownership of business logic.

Key architectural goals include:

Full decoupling between services via asynchronous messaging (RabbitMQ)
Clear separation of responsibilities using bounded contexts
Scalable and testable core logic with well-defined APIs and contracts
Support for dynamic execution plans and compensation flows
Built-in observability, auditability, and failure handling
Easy to extend new features or services for the client's needs by adding new subdomains or replacing existing ones
Exceptional resilience
Simplified remediation of operational data challenges

The following sections describe system components, communication flows, architectural principles, and processing strategies in more detail.

Design

SCDOM system is designed as a set of microservices, each responsible for a specific domain. The system is built using Java and Spring Boot, with a focus on modularity and scalability.

Each domain is represented as a separate microservice, with its own database and API. The microservices communicate with each other using asynchronous messaging via RabbitMQ (opens in a new tab). This allows for loose coupling between services and enables independent development and deployment.

Each module was implemented using DDD and Hexagonal Architecture principles. The core domain logic is separated from the infrastructure and external dependencies, allowing for easier testing and maintenance.

Because of the high separation of concerns of each domain and its microservice, the system is highly modular and can be easily extended or modified. Each domain can be developed and deployed independently. Moreover, each domain is responsible for its own data integrity and business logic, allowing for a clear separation of concerns. This yields high cohesion and low coupling of each domain.

Distributed Design and Concurrency

The system was designed with scalability, resilience, and distributed concurrency in mind. Key design aspects include:

Active-Active Architecture

New pods can be added or terminated on demand, providing horizontal auto-scaling capabilities.
The application can operate simultaneously across multiple active nodes for high availability.
Built-in listener-broker automatically provisions new pods as needed based on event notifications, further enhancing scalability and responsiveness.

Resilient Event Processing

`RabbitMQ` queues offer resilient, lazy event processing with high scalability and load balancing.
Multiple pods may run concurrently to efficiently handle event traffic.

Single Instance Task Execution

For tasks that must be executed on only one pod (e.g., scheduled tasks), a `ShedLock` mechanism is employed.
See the ShedLock library (opens in a new tab) for implementation details.

Concurrency Control

For concurrent access (e.g., updating orders payload), an Optimistic Locking mechanism is used.
This mechanism rolls back conflicting messages and reprocesses them shortly thereafter.

Resilient Data Storage

`MongoDB` is deployed in cluster mode, ensuring data resilience and high availability.

Internal Communication

Since the system is Event-Driven and CQRS-based, all internal communication is handled via the RabbitMQ (opens in a new tab) AMQP messaging system. Each service listens to specific queues and reacts to incoming messages, enabling service decoupling. This common standard of using a unified and coherent method for internal communication also allows further optimization, such as dynamic routing.

Command, Query, and Event Contracts

In our application, we use separate contracts for Commands, Queries, and Events. This separation allows us to implement them independently, including having different models for reads and writes. This approach ensures that our system remains flexible, scalable, and maintainable.

Commands: Commands are used to request an action or change in the system. They are sent to a specific service that handles the command and performs the necessary operations.
Queries: Queries are used to request data from the system. They are sent to a specific service that retrieves and returns the requested data.
Events: Events are used to notify other parts of the system about changes or significant occurrences. They are published to a message broker and consumed by any interested services.

The synchronous and asynchronous way of communication is described in the Communication section.

Internal Contracts

Each domain has its own set of contracts for each type of message (Commands, Queries, and Events). These contracts are implemented as separate Maven modules, allowing for versioning and ensuring compatibility. Each contract module follows a versioning scheme that complies with semantic versioning, ensuring backward compatibility and clear versioning for updates.

Easy to add or replace subdomains

Having decoupled architecture with well-defined contracts allows us to plug in new services or replace existing ones without affecting the overall system. This flexibility is crucial for maintaining a competitive edge in a rapidly evolving business landscape.

This means if the client's deployment needs to use a copy of data in a different technology or format (e.g., PostgreSQL instead of MongoDB). The only requirement is to implement the contract and ensure that the data is consistent with the original source.

The same goes with replacing existing services with new ones. If the client wants to use a different technology or implementation. The only requirement is to implement the contract (send/receive messages according to published and versioned libraries).

Both new and replaced services can be developed and deployed without any changes to the core domain.

Logical view

Sciamus Order Management

Processing Flows

Fullfilment

Incoming orders arrive to the system by incoming-orders TMF API and are queued on RabbitMQ (opens in a new tab).
Next the order-service engine retrieves orders to manage and control order processing.
Order is routed to order-validation microservice, where order validation happens.
The magic of building execution plan happens in plan-builder which uses product-catalogue and its data stored in MongoDB as well as order data.
Following the establishment of the execution plan, the order-fulfilment module manages the execution of subsequent steps calling relevant step-executors.
The whole process is monitored by audit-log and exceptions are stored in error-service.
Upon the order completion order-fulfilment notifies originating northern system about finished processing.

During end to end order processing the communication is asynchronous thanks to RabbitMQ (opens in a new tab) and state persistence is achieved via MongoDB. Further mentions of these components are omitted for clarity purposes.

Errors

Each Step Executor may end processing with an error. It reports this to the order-fulfilment, which marks this step as ERROR ("red") and sends the problem to the error-service to be resolved by human or automatic actions.

The step-executor reports the error to the order-fulfilment.
The order-fulfilment marks the step as ERROR ("red"), and further order processing is halted.
The order-fulfilment reports the error to the error-service for persistence and further human or automatic actions.

If the operator decides that the step should be retried or skipped, the error-service sends a command with resolution of this error:

The bff-service sends an action complete or retry action request to the error-service.
The error-service marks the error as been processed.
The error-service notifies the order-fulfilment that the step should be skipped or retried, marking the step as EXECUTION ("blue").
The order-fulfilment commands a retry to the failed step-executor, or if skip was chosen, marks the step as completed and resumes fulfilment.
The order-fulfilment step-executor waits for a response.

Notice about Completing Errors Manually

When opting to manually complete an error (i.e. to skip it), the operator must ensure that order consistency is maintained. This may require manual adjustments to the order payload or modifications to the external system’s state. Manual intervention is therefore recommended over relying on automated actions inherent in the skipped process (e.g. step execution work).

Compensation

Any order may be cancelled during fulfilment processing. First all pending steps need to be halted. Then compensation plan is built.

The order-fulfilment sends Cancel Command to all step-executors in EXECUTING and WAITING state.
If step-executor supports it, it may stop step execution. If not, then it needs to be stopped naturally or by force. Natural stop involves going through error handling.
After cancellation order-fulfilment will not proceed with next steps.
Upon order-fulfilment getting notification about all step-executors finished, it marks all not executed steps as ABANDONED and hands over command to the order-service.
The order-service goes back to plan building, which will generate a posterior plan based on previous plan execution status.
The compensation plan is appended to the canceled plan and order-fulfilment starts processing the new fulfilment plan.

Components

GUI and BFF

Platform user interface architecture utilizes standard pattern, frontend presentation layer gui and BFF (backend for frontend) service layer bff-service.

On this level the authentication and partial authorization takes place.

Presentation layer is based on Angular (opens in a new tab) and service layer was developed as Java (opens in a new tab) microservice.

The user interface allows order execution monitoring, event audit and has robust error handling console. In addition, a customizable dashboard is provided, and product catalogue hot deployment facility.

Further details on UI can be found here.

Incoming Orders

Java (opens in a new tab) microservice providing TMF622 Product Ordering Management (opens in a new tab) compliant REST API for northern systems as a connector. The main purpose is to receive orders from northern systems and put them on the RabbitMQ (opens in a new tab) queue with generated OrderId that involved parties will use to communicate.

Uses RabbitMQ (opens in a new tab) and state persistence in MongoDB.

Order Service

Java (opens in a new tab) core domain service. Microservice orchestrates the initial order processing phases: extracting, validation, preprocessing and plan building and finally delegates order fulfilment.

It combines order processing and custom order's payload operations via custom PayloadOperator. This interface allows order-service to extract crucial info and provide additional custom operations support, like delivering cancellation codes or executing patches on order's payload.

As a main orchestrator, it is responsible for order's lifecycle management, including order's state transitions, order's payload operations, and order's audit logging.

Sciamus Order Management

Uses RabbitMQ (opens in a new tab) and state persistence in MongoDB.

Order validation

Java (opens in a new tab) microservices facilitate the application of validation rules through the plug-in pattern, allowing for the verification of order completeness and correctness.

Plan Builder

Java (opens in a new tab) microservice implementing plan building and optimization algorithm based on product catalogue data retrieved from MongoDB with help of product-catalogue. It is stateless.

Product catalogue

Java (opens in a new tab) microservice providing facility to upload product catalogue. Supports versioning, validation and hot deployment of new version catalogue.

Order fulfilment

Java (opens in a new tab) microservice whose main objective is to travel over dynamic order execution graph, call relevant step-executors, wait for results, notify about finished processing or report any errors to error-service. Additionally, it supports steps halting, abandoning and retrying. May combine more than one execution plan for a single order (due to e.g. compensation).

Step Executors

From an architecture perspective, step executors are not part of the platform. Their actual implementation technology is transparent from the platform perspective as long as the step executor respects the defined inbound and outbound APIs. To aid with the platform, a template step executor in Java is provided.

Communication

Incoming/Outgoing

North Communication

Incoming traffic calls

Incoming Orders Service communicates with northern systems via TMF622 Product Ordering Management (opens in a new tab) compliant REST API. This involves sending new orders as well as querying them. Queries may be done by a separate flow that goes through embedded or custom Read Model. The same service is responsible for security checks (does the client have proper roles assigned).

Callbacks/notices to the north systems

For notices to the north systems, a custom logic and custom contracts (on any given communication standard).

GUI Communication

All GUI requests via HTTP/REST are handled by the BFF service. It also checks permissions. Next, such requests are transformed and redirected to deeper services, e.g., Orders/Fulfillments/Errors Service via RabbitMQ (opens in a new tab) AMQP messaging system.

Internal Communication

Asynchronous Communication over AMQP

This is a standard way of handling Event Driven Architecture. Each subdomain reacts to incoming messages and sends responses to other services. This allows for a high level of decoupling and scalability.

Synchronous Communication over AMQP

Sometimes systems need to query information from another subdomain or send a command and receive confirmation of its completion. In such scenarios, to avoid maintaining a special case state flow, the request-response pattern is used. The request is sent to the queue, and the response is sent back to the reply-to queue. A separate thread waits for the reply and resumes the main process upon receiving confirmation. This is a built-in mechanism in AMQP (opens in a new tab) communication, additionally supported by the Spring AMQP adapter.

Architectural Principles

In developing our Order Management system, we've adhered to a set of fundamental principles such as Event-Driven Architecture (EDA), Command Query Responsibility Segregation (CQRS), Centralized Event Store, Domain-Driven Design (DDD), Behaviour-Driven Development (BDD), and Hexagonal Architecture. These principles have guided our efforts to create a reliable, scalable, and functional platform. By embracing these principles, we've ensured that our system maintains High Availability (HA), Fault Tolerance (FT), scalability, transactional consistency, and thorough audit capabilities. Our focus is not just on managing orders, but on building a system that prioritizes reliability and efficiency for our users and stakeholders.

Event-Driven Architecture

Event-Driven Architecture (EDA) is a design paradigm that enables components to asynchronously communicate and react to events. This approach decouples services by having them respond to significant changes or events rather than directly calling each other. EDA can be implemented through various patterns, including Event Notification, Event-Carried State Transfer, and Command Query Responsibility Segregation (CQRS), each addressing different aspects of event-driven communication and data management.

Sciamus Order Management

Event Notification

Event Notification is a pattern within EDA where an event producer sends a signal to notify other parts of the system about a change or occurrence, without necessarily transferring the full state of the changed entity. This pattern is useful for triggering side effects, such as sending an email or updating a user interface and relies on consumers to react to the notification by initiating further actions if necessary.

Key features include:

Lazy Processing: Events can be processed at a later time, allowing for more efficient resource utilization.
Service Upgrades: Services can be upgraded without stopping the sender, ensuring continuous operation.
Scalability: The system can be easily scaled by adding more consumers to handle the increased load.
Message Persistence: Unprocessed messages are persisted, ensuring no data loss and enabling reliable processing.
Dynamic Routing: The system supports dynamic routing based on deployed business logic, allowing for alternate flows and A/B testing.
Traffic Management: The system can distribute message bursts over time, moving traffic to calmer periods, ensuring efficient resource utilization.

Event-Carried State Transfer

Event-Carried State Transfer involves including the complete state or a significant portion of the state needed to process an event within the event notification itself. This allows event consumers to update their state based on the event without needing to query the source of the event for more information. It enhances the system's decoupling by reducing dependencies between services and can significantly improve performance by minimizing the need for additional database calls.

In our system, the good example is the fulfillment - step executor communication. On step start, a step executor get's all needed information from that domain decreasing a need to additional send queries. Of course, if needed, it can ask e.g. orders domain for a details of currently processed order.

Command Query Responsibility Segregation

CQRS is a design pattern that separates the read and write operations of a data storage system into distinct interfaces. This separation allows for optimization of each operation, improves scalability by allowing reads and writes to be scaled independently, and enhances security by controlling access to different types of operations. CQRS is often used in conjunction with Event Sourcing, where changes to the application state are stored as a sequence of events, further enabling complex systems to achieve high levels of performance and maintainability.

In our application, we use separate contracts for Queries, Events, and Commands. Having them separated allows us to implement them independently, including having different models for reads and writes. This approach ensures that our system remains flexible, scalable, and maintainable.

Centralized Audit Logs

Having a centralized audit log allows us to track and analyze what happens with each order throughout its lifecycle. This is possible because the entire domain operates within a common order domain, ensuring that all events, actions, and state changes are consistently recorded and accessible. This centralized approach provides a comprehensive view of order processing, enabling better monitoring, troubleshooting, and optimization of the order management system.

Domain-Driven Design

Domain-Driven Design (DDD) is a methodology that focuses on the core domain logic of the application, placing primary importance on the complex needs of the domain itself. DDD emphasizes collaboration between technical and domain experts to create a model that accurately reflects and addresses the domain's intricacies. This approach promotes a deeper understanding of the domain, leading to more relevant and flexible software solutions.

Having domains strictly separated as separate projects linked via contracts allows us to do development separately (in the bounds of a contract). This brings a set of perks like independent development, deployment, separate namespaces, simplified code, and its encapsulation. Each domain uses its own database, so the domain itself keeps data consistent.

Through numerous Big Picture Event Storming workshops, we have meticulously explored the domain and business needs. This collaborative effort allowed us to identify and delineate subdomains, uncovering the core domain and supporting domains. By leveraging Domain-Driven Design (DDD) principles, we were able to model the ubiquitous language and bounded contexts, leading to the discovery of strategic solutions that align with the business objectives and ensure a cohesive architecture.

Sciamus Order Management

Behaviour-Driven Development

Behaviour-Driven Development (BDD) is an approach to software development that encourages collaboration among developers, QA, and non-technical or business participants in a software project. BDD focuses on obtaining a clear understanding of desired software behaviour through discussion with stakeholders. It uses specific examples to drive development and encourages the use of a language that can be understood by all parties. This results in software that more accurately meets the business needs and expectations.

Hexagonal Architecture

Hexagonal Architecture, also known as Ports and Adapters Architecture, is a design pattern that promotes the separation of concerns by externalizing the inputs and outputs of the application. The core logic of the application resides within the hexagon, while interactions with the outside world occur through ports and adapters. This setup allows for easy adaptation to changes in external technologies or interfaces without modifying the application's core logic. It supports multiple channels of communication and enables an efficient way to test the application by replacing external elements with test doubles.

Sciamus Order Management

Intro TMF622 Inbound API