Scaling configuration in RPI v7

Introduction

In today's dynamic and fast-paced technological landscape, ensuring that your applications can handle varying loads efficiently is crucial. This how-to document will guide you through the process of configuring scaling for RPI v7 environments, focusing on the importance of scaling and resource management within Kubernetes environments.

Scaling is a fundamental aspect of managing Kubernetes environments, as it allows your applications to handle increased demand by adjusting the number and size of running instances and nodes. Proper scaling ensures that your applications remain responsive and performant, even during peak usage times.

A combination of auto-scaling, monitoring, and manual tuning by administrators is essential to maintain a robust, stable, and performant environment.

Auto-scaling allows Kubernetes to automatically adjust the number of replicas and nodes based on current demand, increasing the likelihood that your applications can handle varying loads without manual intervention.
Monitoring provides real-time insights into the performance and resource usage of your applications, enabling administrators to make informed decisions about scaling and resource allocation.
Manual tuning allows administrators to fine-tune the scaling parameters and resource limits based on the specific needs of their applications, ensuring optimal performance and stability.

By following the guidelines provided in this document, you will be able to define an approach for scaling RPI v7 environments effectively in your environment, leveraging the power of Kubernetes to maintain a robust, stable, and performant solution. Technical instructions for administrators configuring RPI v7 scaling can be found in the "Configure autoscaling" section of our GitHub readme.

Containers deployed as part of the RPI solution

Container	Description	Autoscaling Pattern Supported
`rpi-integrationapi`	The RPI Integration API provides a series of endpoints that allow for a third-party development team to invoke RPI functionality.	HPA / KEDA
`rpi-interactionapi`	All user requests to the server are handled by the Interaction API web service. All requests to the Interaction services are made using HTTPS. A valid certificate must be installed on the application server, and the service must be configured to use it.	HPA / KEDA
`rpi-executionservice`	The Execution service is responsible for executing server jobs generated by the undertaking of activities in the RPI client application; for example, the testing of an audience or execution of an interaction workflow. It is also responsible for the execution of asynchronous server-side jobs (e.g., catalog synchronization). The RPI client communicates with the Execution service via the Interaction API web service. Most communication between the Interaction API service and the Execution service is conducted via the Pulse operational database. The Execution service polls the database regularly to pick up and process the next request.	KEDA
`rpi-nodemanager`	The Node Manager service undertakes the following tasks: Management of RPI triggers, which are responsible for the initiation of activity in interaction workflows. Allocation of work to an Execution service.	HPA / KEDA
`redpoint-interaction-queue-reader`	The Queue Reader service is responsible for the draining of queues used in the following contexts: Queue listeners RPI Realtime	HPA / KEDA
`rpi-callbackapi`	The RPI Callback Service is used to retrieve results from the following channel providers: SendGrid mPulse Outbound Delivery	HPA / KEDA
`rpi-realtimeapi`	The RPI Realtime web service facilitates the making of content applicability decisions, and the recording of events undertaken by a site visitor, for example.	HPA / KEDA
`rpi-deploymentapi`	The RPI Configuration service provides: A self-documented series of endpoints that facilitate the installation and maintenance of an RPI cluster and its databases. Application settings editors, which can be used to generate JSON for inclusion in appsettings files. The same editors also document environment variable names. Access to a Downloads page, from which the RPI Client application and a number of other useful ZIP files may be obtained. The service is only used for deployment and configuration.	HPA / KEDA

All scaling configurations should be iterated on in both staging and production environments. There is no one-size-fits-all approach that will be appropriate for all installations and use cases. Additionally, CPU requests should be made to align as closely as possible to workload demands to avoid unnecessary rescheduling or scale out. Memory requests and limits must be equal to ensure proper scheduling and to avoid rescheduling. CPU and memory should not be treated the same with regard to requests and limits. They behave differently, as memory is not compressible. Auto-scaling should be turned off during the tuning phase.

Redpoint recommends the use of Grafana or similar tools to regularly assess resource utilization of the nodes both on a per-pod basis and in the aggregate. Administrators should adjust their deployment configuration with regard to CPU, memory, and replicas based on observations gathered during their regular assessments. At all times, memory requests and limits should remain specified and aligned. At no point should CPU limits be reapplied. In general, it is better to add replicas rather than increase CPU and memory unless one observes that all pods generally consume all available resources regardless of replica count. In doing so, one should be cognizant of total CPU and memory demands. If they exceed cluster node maximums, pods will fail to schedule. This too should be calculated during regular assessments and adjusted as needed.

Scaling Strategy For RPI 7.x

RPI 7.x deploys to Kubernetes (k8s). It is therefore possible to dynamically scale the application's various services. This is not, however, without risk. Several strategies are possible with different trade-offs.

Horizontal Pod Autoscaler (HPA)

HPA uses a simple set of pod metrics to dynamically scale a given replica set. The trade-off is that it may scale down while the pod is doing some stateful activity, which is no different than application failure from an end user's point of view.

When to use it?
- A good fit for cost-sensitive businesses that can tolerate disruption to operations induced by scaling behaviors.
When to avoid it?
- When the pod metrics measured by HPA do not accurately capture the need to scale.
- When the business can't tolerate disruptions of running workloads.
- The cost of disruptions outweighs the savings of auto-scaling.

Kubernetes-based Event Driven Autoscaler (KEDA)

KEDA is a superset of HPA. It dynamically scales a given replica set based on a set of more advanced metrics or user-defined custom metrics. As with HPA, scaling behavior may result in what end users perceive as an application failure.

When to use it?
- A good fit for cost-sensitive business that can tolerate disruption to operations induced by scaling behaviors, using a set of more complex measures to determine scaling behavior.
When to avoid it?
- When no combination of metrics accurately captures the conditions under which scaling should occur.
- When the business can't tolerate disruptions of running workloads.
- The cost of disruptions outweighs the savings of auto-scaling.

Over-provisioning

Over-provisioning is a strategy wherein the k8s administrator allocates more compute than is necessary to run business workloads at peak and beyond. The idea is to avoid scaling behaviors entirely once the system is deployed. Adjustments to the deployment would only occur in a maintenance window where the business expects the system to be unavailable.

When to use it?
- When the business can't tolerate any potential disruptions to running workloads.
- The cost of disruptions outweighs the savings of auto-scaling.
When to avoid it?
- When cost concerns outweigh all other considerations, including the possibility of disruption of running workloads.