Introduction
This topic guides you through the process of configuring scaling for RPI v7 environments, focusing on the importance of scaling and resource management within Kubernetes (K8s) environments, ensuring that your applications can handle varying loads efficiently.
In managing Kubernetes environments, scaling allows your applications to handle increased demand by adjusting the number and size of running instances and nodes. Proper scaling ensures that your applications remain responsive and performant, even during peak usage times.
A combination of auto-scaling, monitoring, and manual tuning by administrators is essential to maintain a robust, stable, and performant environment.
-
Auto-scaling allows Kubernetes to automatically adjust the number of replicas and nodes based on current demand, increasing the likelihood that your applications can handle varying loads without manual intervention.
-
Monitoring provides real-time insights into the performance and resource usage of your applications, enabling administrators to make informed decisions about scaling and resource allocation.
-
Manual tuning allows administrators to fine-tune the scaling parameters and resource limits based on the specific needs of their applications, ensuring optimal performance and stability.
By following the guidelines provided in this document, you will be able to define an approach for scaling RPI v7 environments effectively, leveraging the power of Kubernetes to maintain a robust, stable, and performant solution. For technical instructions for administrators configuring RPI v7 scaling, refer to the "Configure autoscaling" section of our GitHub readme.
Containers deployed as part of the RPI solution
For containers where both HPA and KEDA auto-scaling patterns are supported, consider the following:
-
If you just want to auto-scale based on CPU or memory utilization, use HPA mode.
-
If you want a more advanced auto-scaling based on RPI custom metric, use KEDA.
|
Container |
Description |
Auto-scaling Pattern Supported |
|---|---|---|
|
|
The RPI Integration API provides a series of endpoints that allow for a third-party development team to invoke RPI functionality. |
HPA / KEDA |
|
|
All user requests to the server are handled by the Interaction API web service. All requests to the Interaction services are made using HTTPS. A valid certificate must be installed on the application server, and the service must be configured to use it. |
HPA / KEDA |
|
|
The Execution service is responsible for executing server jobs generated by the undertaking of activities in the RPI client application; for example, the testing of an audience or execution of an interaction workflow. It is also responsible for the execution of asynchronous server-side jobs (e.g., catalog synchronization). The RPI client communicates with the Execution service via the Interaction API web service. Most communication between the Interaction API service and the Execution service is conducted via the Pulse operational database. The Execution service polls the database regularly to pick up and process the next request. |
HPA / KEDA |
|
|
The Node Manager service undertakes the following tasks:
|
HPA / KEDA |
|
|
The Queue Reader service is responsible for the draining of queues used in the following contexts:
|
HPA / KEDA |
|
|
The RPI Callback Service is used to retrieve results from the following channel providers:
|
HPA / KEDA |
|
|
The RPI Realtime web service facilitates the making of content applicability decisions, and the recording of events undertaken by a site visitor, for example. |
HPA / KEDA |
|
|
The RPI Configuration service provides:
|
HPA / KEDA |
|
Data activation services |
||
|
The Data Activation Platform (CDP) feature (services/containers prefixed by |
||
|
|
The Auth Services pod is the central identity and access management service for the CDP platform. It integrates with Keycloak as its OAuth2/OIDC identity provider to manage user authentication, role-based access control, and multi-tenant client configuration. It also provides bidirectional user synchronization between Keycloak and RPI clusters. |
HPA |
|
|
The Keycloak container provides the OAuth2/OIDC identity provider for the CDP platform, packaging Keycloak 26.x with a custom "RedPoint" theme featuring RedPoint-branded login, password reset, OTP authentication, and account management pages. It serves as the underlying identity store and SSO provider that the Auth Services pod integrates with for user authentication and token management. |
HPA |
|
|
The Services API is the central REST API gateway for the CDP platform, serving as the bridge between the UI and the platform's core backend services. It exposes versioned endpoints (v1/v2/v3) for managing campaigns, audiences, data imports, channels, file operations, and client configurations, backed by MongoDB persistence and Redis caching. It integrates with RPI for interaction and audience execution, Keycloak for authentication, and Sigma for analytics. |
HPA |
|
|
The CDP UI is the Angular-based web application that serves as the primary user interface for the CDP platform. It provides marketers and data analysts with tools for customer data activation, audience segmentation, campaign management, data quality monitoring, and reporting. The application is served via Nginx as a containerized single-page application and communicates with backend services through the CDP Services API. |
HPA |
|
|
The CDP SocketIO Server is a Node.js/Socket.IO real-time WebSocket communication hub that routes event notifications between backend services and client applications. Services connect via a dedicated namespace to broadcast messages, which are then routed either to all connected clients or to specific client rooms based on client ID targeting. It supports optional Keycloak OAuth2 authentication for client connections. |
HPA |
|
|
The CDP Maintenance service orchestrates system-wide maintenance windows during CDP platform releases. It coordinates the pausing of background automations, executes version-specific database migrations and data synchronization tasks, and notifies connected UI clients of maintenance status in real time. |
HPA |
|
|
The CDP Init container validates that dependent services are available at startup, then creates default RPI folder structures and metadata definitions for each configured client. It continues running to periodically synchronize RPI caches and perform housekeeping. |
HPA |
|
|
The CDP MessageQ container a message broker (RabbitMQ) that provides asynchronous messaging between platform services using queues and exchanges. |
HPA |
|
|
The CDP Cache container is an open-source, in-memory data store (Redis) used for caching and improving application performance across platform services. |
HPA |
All scaling configurations should be iterated on in both staging and production environments. There is no one-size-fits-all approach that will be appropriate for all installations and use cases. Additionally, CPU requests should be made to align as closely as possible to workload demands to avoid unnecessary rescheduling or scale out. Memory requests and limits must be equal to ensure proper scheduling and to avoid rescheduling. CPU and memory should not be treated the same with regard to requests and limits. They behave differently, as memory is not compressible. Auto-scaling should be turned off during the tuning phase.
Redpoint recommends the use of Grafana or similar tools to regularly assess resource utilization of the nodes both on a per-pod basis and in the aggregate. Administrators should adjust their deployment configuration with regard to CPU, memory, and replicas based on observations gathered during their regular assessments. At all times, memory requests and limits should remain specified and aligned. At no point should CPU limits be reapplied. In general, it is better to add replicas rather than increase CPU and memory unless one observes that all pods generally consume all available resources regardless of replica count. In doing so, one should be cognizant of total CPU and memory demands. If they exceed cluster node maximums, pods will fail to schedule. This too should be calculated during regular assessments and adjusted as needed.
Scaling Strategy For RPI 7.x
RPI 7.x deploys to Kubernetes (K8s). It is therefore possible to dynamically scale the application's various services. This is not, however, without risk. Several strategies are possible with different trade-offs.
Horizontal Pod Autoscaler (HPA)
HPA uses a simple set of pod metrics to dynamically scale a given replica set. The trade-off is that it may scale down while the pod is doing some stateful activity, which is no different than application failure from an end user's point of view.
-
When to use it?
-
A good fit for cost-sensitive businesses that can tolerate disruption to operations induced by scaling behaviors.
-
-
When to avoid it?
-
When the pod metrics measured by HPA do not accurately capture the need to scale.
-
When the business can't tolerate disruptions of running workloads.
-
The cost of disruptions outweighs the savings of auto-scaling.
-
Kubernetes-based Event Driven Autoscaler (KEDA)
KEDA is a superset of HPA. It dynamically scales a given replica set based on a set of more advanced metrics or user-defined custom metrics. As with HPA, scaling behavior may result in what end users perceive as an application failure.
-
When to use it?
-
A good fit for cost-sensitive business that can tolerate disruption to operations induced by scaling behaviors, using a set of more complex measures to determine scaling behavior.
-
-
When to avoid it?
-
When no combination of metrics accurately captures the conditions under which scaling should occur.
-
When the business can't tolerate disruptions of running workloads.
-
The cost of disruptions outweighs the savings of auto-scaling.
-
Over-provisioning
Over-provisioning is a strategy wherein the K8s administrator allocates more compute than is necessary to run business workloads at peak and beyond. The idea is to avoid scaling behaviors entirely once the system is deployed. Adjustments to the deployment would only occur in a maintenance window where the business expects the system to be unavailable.
-
When to use it?
-
When the business can't tolerate any potential disruptions to running workloads.
-
The cost of disruptions outweighs the savings of auto-scaling.
-
-
When to avoid it?
-
When cost concerns outweigh all other considerations, including the possibility of disruption of running workloads.
-