Redpoint Best Practices Documentation
Breadcrumbs

Standardization

Introduction

Data standardization is the process of unifying data formats, schemas, and values (e.g., names, addresses, emails, phone numbers) to ensure consistency and reliability across all systems and channels. This is a foundational step in the process of achieving reliable and unified customer profiles, enabling seamless data activation across various channels, and ensuring that organizations can effectively utilize their data.

This topic outlines Redpoint’s philosophy, guiding principles, and practical approach to data standardization, providing insights into how you can implement standardization effectively within your organization.

Redpoint’s philosophy on data standardization

Redpoint’s approach is to standardize the data once and reuse it everywhere; that is, create a single, governed data environment—covering schemas, identity, consent, events, reference data, and data contracts—and enforce it through configuration, automation, and promotion gates in our platform. This ensures:

  • Reliable, “golden record” customer data

  • Faster onboarding of new sources and channels

  • Consistent downstream activation and measurement

  • Higher match rates and accuracy

  • Shorter build cycles and provable compliance

Customer Data Readiness Hub diagram

This diagram outlines the primary process steps within the Customer Data Readiness Hub, provides a detailed breakdown of specific activities within each major group, and identifies the particular step addressed on this page with the "Current Topic" marker.

image-20251110-175636.png

Core principles

  1. Canonical-first: Establish a canonical model, a single source of truth, that serves as the foundational framework for representing key entities such as people, households, events, consent, and reference codes, ensuring consistency and clarity across applications and systems.

  2. Config over code: Enforce governance through configurations rather than relying on custom scripts. This includes the use of schemas, rules, and lookup sets, which allow for easier management and updates while minimizing the risk associated with hard-coded solutions.

  3. Identity-centric: Treat the principles of identity and consent as non-negotiable standards within the framework. Design all other components and processes to align with and support these core tenets, ensuring that they remain at the forefront of any operational strategy.

  4. Validation at the edge: Implement rigorous schema checks and data quality rules to prevent data drift at the point of ingestion. Address potential issues proactively rather than adopting a reactive approach of “fixing it later,” which can lead to greater complications down the line.

  5. Versioned & composable: Use small, versioned building blocks, such as “Address v2” to facilitate safe and manageable changes. This approach allows for incremental updates and enhancements without disrupting existing systems, promoting a more agile and responsive development environment.

  6. Evidence-driven: Evolve standards based on measurable metrics related to adoption, quality, and performance. Relying on evidence and data-driven insights allows you to make informed decisions that enhance operational effectiveness and ensure that standards remain relevant and impactful.

Scope of standardization

Redpoint plays a crucial role in defining and enforcing a comprehensive set of standards to maintain consistency and quality across various data management processes:

  • Canonical customer & party schema: A structured framework for representing entities such as individuals, accounts, and households, including the definition of keys, attributes, and flags for Personally Identifiable Information (PII) to ensure that data is organized and compliant with privacy regulations. Refer to Redpoint's approach to hygiene, matching, and identity resolution for detailed information about name, address, email, phone standardization parsing and validation processes.

  • Identity resolution standards: Guidelines for identity resolution, including creating match keys, thresholds for determining matches, survivorship rules for managing conflicting data, and a deduplication policy to eliminate redundant entries.

  • Consent & preference model: Requirements for obtaining and managing consent, emphasizing purpose-based consent mechanisms. This model addresses the source of consent, proof of consent, expiry dates, and jurisdictional considerations to ensure compliance with legal standards.

  • Event model: Standardized naming conventions for events, requiring uniform event names and defining the necessary attributes that must be included. It also establishes timestamp semantics and identity binding protocols to ensure accurate event tracking.

  • Reference data: The creation of code sets that categorize essential reference data such as country, channel, language, currency, and lifecycle stage to ensure consistency in how data is referenced and utilized across various applications.

  • Data quality policies: Data quality enforcement that specifies requirements for fields, including whether they are required or nullable. It also includes type and domain checks, format rules, and remediation processes to address any data quality issues that may arise.

  • Data contracts for activation: Detailed specifications for data activation, including the order of fields, data types, encoding formats, null-handling procedures, and error semantics.

  • Lineage & metadata: Defining the owner of the data, its sensitivity level, service level agreements (SLA), retention policies, version control, and tagging for easier data management and retrieval.

KPIs to prove value

KPI

What it measures

Why it matters

Schema conformance rate

The ratio of valid records to rejected records.

Provides insight into the quality and consistency of the data being processed. A high conformance rate indicates that the data adheres to predefined schemas, which is crucial for maintaining data integrity and usability.

Duplicate rate & stitch quality

The effectiveness of deduplication efforts by comparing pre- and post-deduplication rates. It also assesses the match confidence distribution.

Helps to understand how accurately records are stitched together, ensuring that unique profiles are maintained without redundancy.

Golden Record coverage

The percentage of profiles that possess a Golden Record ID, a unique identifier.

Ensures data accuracy and assesses attribute completeness, highlighting how well the profiles are enriched with necessary data points, which is essential for effective data utilization.

Consent coverage & enforcement

The number of deliveries that are blocked by policy compared to the total number of attempted deliveries.

It is critical for compliance with data protection regulations and ensuring that user consent is respected in all data transactions.

Event integrity

The rate of duplicate or late events, as well as the success of ID binding processes.

High event integrity is vital for maintaining the reliability of event-driven systems and for accurate event tracking.

Activation reliability

The contract validation pass rate.

Indicates how often contracts meet the necessary criteria for activation. It also tracks the rollback count, which reflects the number of times an activation process had to be reversed, providing insight into the reliability of the activation process.

Time-to-onboard

The number of days required to add a new source or channel while using existing standards.

A shorter onboarding time indicates a more efficient process, which can enhance the overall agility of the data management system.

Example starter standards

Canonical customer (excerpt)

Field

Type

Required

Notes

customer_id

GUID

Golden Record ID issued by CDP

primary_email

string

 

Validated; survivorship-ranked

first_name/last_name

string

 

UTF-8; title-case normalization

birth_date

date

 

ISO-8601; jurisdictional use flags

address_*

structured

 

Postal standards; geocoded where allowed

source_systems

array

 

Provenance, first_seen_ts, last_seen_ts

Field

Type

Required

Notes

customer_id

GUID

Links to Golden Record ID

purpose

enum

E.g., marketing.email, personalization

channel

enum

 

Email, SMS, push, mail

status

enum

Granted, denied, withdrawn

jurisdiction

enum

 

E.g., GDPR, CCPA

source/proof

string

 

Capture method, evidence URI

effective_ts
expiry_ts

datetime

 

ISO-8601

Risk controls & guardrails

  • PII boundary checks at ingest and pre-activation; block exports violating policy. This ensures that any personally identifiable information (PII) is properly managed and safeguarded from the moment it is ingested into the system, preventing any unauthorized exports that could compromise user privacy and violate established policies.

  • Consent enforcement at decision time; no send without valid purpose. This process guarantees that data is only shared or utilized when there is a legitimate reason to do so, ensuring that user consent is respected and upheld throughout the decision-making process.

  • Data drift alerts for conformance, match-rate drops, or unusual event volumes. Implementing alerts for data drift allows for proactive monitoring of data integrity and consistency, helping to identify any discrepancies or unexpected changes in data patterns that could indicate underlying issues.

  • Immutable audit logs for schema/rule/contract changes; rollback plans and change windows. Maintaining immutable audit logs is crucial for tracking changes to schemas, rules, and contracts, providing a reliable history of modifications. This also includes having rollback plans and defined change windows to ensure that any changes can be reverted if necessary, minimizing disruption and maintaining system integrity.

How Redpoint services accelerate the journey

  • Standards catalog: This includes a comprehensive collection of ready-to-use canonical models, an event taxonomy, data quality (DQ) rules, and match policies that are essential for maintaining consistency and accuracy across various data processes. These elements work together to ensure that all data adheres to established standards, facilitating better data management and utilization.

  • Promotion pipeline pack: This pack consists of critical components such as schema definitions, personally identifiable information (PII) management, and consent gates, along with segmentation regression tests. These elements are designed to streamline the promotion of data through different stages of processing while ensuring compliance with privacy regulations and maintaining the integrity of the data.

  • Onboarding playbooks: These playbooks provide a set of repeatable steps that guide users through the process of adding various sources and channels in accordance with the established standards. They serve as a practical resource for teams to ensure that the onboarding process is efficient, consistent, and aligned with best practices, ultimately enhancing the overall workflow.