Blog Post

How to build a scalable execution environment for systematic trading

11.12.2023 Uncategorized by Tobias Czyzewski

The concepts and solutions described in this article were developed for BidFlow Technologies‘ software ecosystem and are deployed in production.

Next to a powerful data science environment with highest quality datasets (mentioned in more detail here), trading the financial markets systematically with a purely data-driven approach also requires a stable and scalable execution environment. It has to be capable of running an arbitrary number of systematic trading strategies on arbitrary instruments traded on arbitrary exchanges or brokers. Due to the heterogeneous nature of the financial world’s technological landscape, this is not a trivial task.

This post discusses how the challenge of building such an execution environment can be tackled by keeping the environment as modular and scalable as possible, always keeping in mind to also achieve high cost-efficiency.

Solution overview

At some point in the past, the digitization of the financial industry led to the decision to standardize exchange of digital financial data – the FIX protocol was born. Today, most renowned exchanges and brokers offer FIX API endpoints to interact with their trading services in an automated way. From a customer perspective, this allows for standardized market access across different venues.

However, it is still the case that certain venues only offer access via a custom API. It is also possible that a custom API offers much more functionality compared to a FIX API that is provided next to it. All in all, the FIX protocol is a voluntary but not a mandatory standard. That led to the decision to not try to unify market access across venues but to handle access to each venue in a customized way, potentially utilizing the full power of their custom APIs.

Another important design choice is to detach services responsible for actual systematic trading from services that handle venue access (real-time data streams, account information, etc.). This enables a powerful feature: being able to dynamically access data from different venues within a single trading service. This also optimizes third-party API usage and prevents hitting API quotas as, for a specific venue, all data is accessed from a single point and then fanned out internally as required.

A third essential choice is to allow trading service modification at runtime. Systematic trading is always based on more or less complex rule sets that are influenced by a set of parameters. As markets evolve over time, it might be necessary to adjust these parameters over time as well. Without the necessary infrastructure in place, changing statically defined parameters would require a restart of the corresponding trading service. This causes unnecessary downtime which can be prevented by making these parameters dynamically editable from the outside at runtime.

The diagram below outlines the components that make up BidFlow Technologies‘ scalable production environment to execute systematic trading strategies. Each of the components is going to be discussed in more detail in the following sections.

Continuous Deployment (CD)

Before we dive into the details of each component, let’s discuss continuous deployment first. As systematic trade execution is time-critical, services within the execution environment should have minimal downtime. Depending on the trade frequency, even some hours of downtime could already severely impact business operations.

For development and bug fixing, continuous deployment minimizes downtime. If set up correctly, changes are integrated on the fly and deprecated services are automatically replaced with their updated counterparts. It also enables separation of test and production environments so that changes can first be tested in a test environment before they are pushed to production.

At BidFlow Technologies, AWS CodePipeline is utilized to implement such a continuous deployment workflow. In this case, deploying and keeping the execution environment up-to-date is actually a subset of the pipeline’s workload. All AWS cloud resources are managed via infrastructure as code. It is also the pipeline’s job to deploy, update or change all of these resources. The diagram below visualizes each stage of the pipeline.

Source stage

The Source stage is triggered as soon as a new commit is added to BidFlow Technologies‘ AWS CodeCommit repository’s master branch. It takes the repository and creates a source artifact from it which is processed in the subsequent stages.

Images stage

The Images stage utilizes AWS CodeBuild to build Docker images for various services and tasks that run in Amazon ECS and are developed and maintained in the CodeCommit repository. The latest Docker image versions are stored in Amazon ECR and can be accessed by ECS.

Templates stage

The Templates stage uploads AWS CloudFormation templates to Amazon S3. These templates define BidFlow Technologies‘ test and production environments in an infrastructure as code manner. As soon as they are available in S3, CloudFormation can utilize the templates to create and modify cloud resources.

Test stage

The Test stage deploys the test environment based on uploaded CloudFormation test templates.

Approval stage

The Approval stage allows continuation to the Production stage after manual approval (which is done after changes to the test environment are reviewed and approved for production).

Production stage

The Production stage deploys the production environment based on uploaded CloudFormation production templates.

Backend services

As mentioned in the solution overview, each exchange and broker to which the trade execution environment has access, has its own backend service running in ECS. Each of these backend services is a set of continuously running tasks that stream real-time data, query account and instrument information in uniform time intervals and route Amazon API Gateway requests. Depending on the venue, they potentially handle other specific tasks as well.

Internally, the backend services utilize Service Connect which allows other services and tasks in the ECS cluster to easily connect to them. Many of the microservices within a backend service host WebSocket servers to forward received data. Trading services can then connect to provided WebSocket endpoints to consume the data.

With this setup, note for example that a single real-time data stream connection to a venue could potentially serve hundreds of trading services, making sure that third-party API usage is as efficient as possible, reducing the likelihood of reaching API quota limits.

Trading services

Trading services run the actual business operations. They execute systematic strategies on different instruments (potentially across different venues, typically one trading service per strategy per instrument), effectively making up an overarching trading portfolio.

In most cases the trading services are stateful. For maximum speed, internal states are stored in memory and without additional mechanisms, these states get lost in the event of a service crash. Therefore MongoDB Atlas is utilized to replicate the states to a MongoDB database in real-time. If a trading service crashes unexpectedly, the state is recovered from said database. This way trading services have extensive self-healing capabilities and can even be restarted intentionally without major interruption (continuous deployment). Hence, downtime is reduced to a minimum which is important for time-critical trading services.

In case of trading service malfunctions, it is useful to have a history of service states available for bug fixing purposes. For this, Amazon DynamoDB is used to store trading service states per point in time. These state logs are retained in DynamoDB tables until they expire (usually after a month). This keeps the DynamoDB tables small without sacrificing important trading service insights in the event of a recent malfunction.

Dynamic settings

As mentioned in the solution overview, it is essential to be able to modify trading parameters at runtime. This can be realized by implementing a REST API with API Gateway (in BidFlow Technologies‘ case an API Gateway HTTP API is used) that allows authenticated Amazon Cognito users to change such trading parameters via API requests.

However, two difficulties arise:

the trading services are dynamic, there is no static set of trading services which can be assumed to be available/running all the time
trading parameters can be unique and different between trading services, there is no static set of parameters that is used in all trading services

Due to these conditions, the decision was made to handle API requests in the backend services. If a trading service for a specific instrument on a specific venue is launched, it is registered in the corresponding backend service’s API microservice that, from this time on, forwards associated requests to the trading service. If said trading service crashes or is shut down, it is instantly deregistered and any further API requests to this trading service will be rejected. This solves the first listed difficulty.

In addition, the design choice was made to handle parameter updates via a generalized update mechanism within each trading service. New parameter values are sent alongside the parameter name so that the targeted trading service knows which parameter to update. This solves the second listed difficulty as no fixed parameter set for each and every trading service has to be enforced.

Conclusion

This post gave a overview of how a cloud-based execution environment for systematic trading can be implemented, making sure that it is scalable and self-healing with minimal downtime. Based on the discussed concepts, such an execution environment got implemented at BidFlow Technologies and is running robustly in production.