Profile unification using out-of-the-box CDP jobs

Overview

Profile Unification (Golden Record) is the foundation of Redpoint’s Customer Data Platform (CDP), providing a single, persistent, and trusted view of each individual. Using out-of-the-box (OOTB) Redpoint CDP jobs, the implementation standardizes, deduplicates, and merges customer records across multiple source systems into a unified entity with full lineage and survivorship transparency.

This topic outlines how this process is implemented using the standard Redpoint Data Management (RPDM) and Unification templates.

Objectives

Consolidate data from multiple source systems, including Customer Relationship Management (CRM) systems, Point of Sale (POS) systems, web platforms, email communications, loyalty programs, and other relevant sources. This comprehensive approach ensures that all customer interactions and transactions are captured, providing a holistic view of customer behavior and preferences.
Standardize and cleanse identity attributes by utilizing Redpoint’s advanced data quality routines. This process involves correcting inaccuracies, removing inconsistencies, and ensuring that all data entries adhere to a uniform format, which enhances the reliability of the data.
Apply both deterministic and probabilistic matching techniques to accurately identify and eliminate duplicate records. Deterministic matching relies on exact matches of key attributes, while probabilistic matching assesses the likelihood of matches based on various data points, allowing for a more nuanced approach to deduplication.
Create a persistent Golden Record that serves as the definitive source of truth for customer identities. This record is built using attribute survivorship logic, which prioritizes the most reliable and relevant information from multiple sources to ensure that the Golden Record remains accurate and up-to-date.
Maintain full source traceability and identity resolution lineage, which allows for transparency in data management. This traceability ensures that every piece of data can be tracked back to its original source, enabling organizations to understand the context and reliability of the information they are using.
Enable downstream activation through Redpoint Interaction or other delivery platforms. This capability allows organizations to leverage the consolidated and cleansed data for targeted marketing campaigns, personalized customer interactions, and improved overall customer engagement strategies. By utilizing this data effectively, businesses can enhance their operational efficiency and drive better outcomes.

Out-of-the-box job framework

Source ingestion jobs

OOTB jobs begin with source ingestion templates that load raw input data into Redpoint’s staging environment:

Input tables: Each source system, such as CRM_Customer, Loyalty_Member, and Web_Profile, is meticulously mapped to a standardized schema. This process ensures that data from various origins can be integrated seamlessly, allowing for a cohesive view of customer information. By adhering to a uniform structure, we facilitate easier data management and retrieval, ultimately enhancing the efficiency of our data processing workflows.
Data quality routines: To maintain high standards of data integrity, we implement a range of common OOTB transformations. These include name parsing, which accurately breaks down and standardizes customer names; genderization, which assigns gender based on available data; email validation to ensure that all email addresses are correctly formatted and active; phone standardization to unify various phone number formats; and address verification, which confirms the accuracy of customer addresses. These routines are crucial for ensuring that the data we work with is not only accurate but also reliable for decision-making processes.
Identity keys: In our unification process, we create surrogate keys for each input record. These identity keys play a vital role in maintaining traceability throughout the entire data integration process. By assigning unique identifiers to each record, we can track changes, updates, and the lineage of data as it moves through various stages of processing. This practice not only enhances data governance but also ensures that we can easily reference and manage records without confusion, thereby supporting robust data management practices.

Standardization jobs

The standardization layer leverages Redpoint’s OOTB data quality modules:

Address cleanse: This process involves applying postal standards to ensure that data is ready for deduplication. By adhering to these standards, we can effectively eliminate duplicate entries and maintain a clean and accurate database. This step is crucial for organizations that rely on precise address information for communication and logistics, as it helps to streamline operations and reduce errors in mail delivery.
Name/email/phone parsing: The normalization of input fields such as names, emails, and phone numbers is essential for improving match accuracy. This parsing technique standardizes the format of these fields, allowing for better integration and comparison across different data sources. By ensuring that names are consistently formatted and that email addresses and phone numbers follow a uniform structure, we can enhance the reliability of data matching processes, ultimately leading to more effective customer engagement and communication strategies.
Casing and tokenization: This step ensures that data remains consistent across all sources by standardizing the casing of text and breaking down information into manageable tokens. Consistency in data presentation is vital for analysis and reporting, as it allows for easier interpretation and reduces the likelihood of discrepancies. By implementing casing and tokenization, organizations can foster a more organized and accessible data environment, which is essential for informed decision-making.
Error and exception logging: This automated system for routing invalid records to error tables for review captures any discrepancies or errors that occur during data processing, allowing for timely identification and correction of issues. By systematically reviewing these records, organizations can continuously improve their data handling processes and ensure that the information they rely on is accurate and up-to-date. This proactive approach not only enhances data quality but also builds trust in the systems that manage and utilize this data.

Matching & clustering jobs

The identity resolution layer applies Redpoint’s proprietary OOTB match rules:

Deterministic matching: This method employs exact or key-based matches, such as email, loyalty ID, or phone numbers, to identify and connect records. By relying on specific identifiers, deterministic matching ensures a high level of accuracy in linking data points, making it a reliable choice for scenarios where precise matches are crucial.
Probabilistic matching: In contrast, probabilistic matching utilizes similarity scores to facilitate fuzzy matching. This approach considers a combination of attributes, such as name, address, and zip code, to assess the likelihood that different records refer to the same individual. By leveraging statistical techniques, probabilistic matching can uncover connections that might not be immediately obvious, thus enhancing the overall data linking process.
Clustering logic: Once records are matched, this logic combines them into a single cluster that represents one individual. This aggregation not only simplifies the data landscape but also provides a clearer view of each person's information, making it easier to manage and analyze.
Matching audit tables: To ensure transparency and accountability in the matching process, matching audit tables are maintained. These tables provide full visibility into which sources contributed to each cluster, allowing for a comprehensive understanding of the data origins. This level of detail is essential for auditing purposes and helps in maintaining the integrity of the data matching process.

Survivorship / Golden Record creation

Once clusters are established, Golden Record creation jobs determine the “best” version of each field:

Field-level survivorship: This concept applies hierarchical rules that dictate how data is prioritized and selected. For instance, in a scenario where a Customer Relationship Management (CRM) system has conflicting information with a web source, the CRM data will take precedence. Additionally, if there are multiple updates to the same data point, the most recent update will be considered the authoritative version, ensuring that the information used is the latest and most relevant.
Completeness ranking: This ranking system is designed to evaluate and prefer data values based on their completeness and confidence levels. Non-null values that are deemed to have a higher degree of certainty are favored over those that are incomplete or uncertain. This approach ensures that the data utilized is not only complete but also reliable, thereby enhancing the overall quality of the information processed.
Date weighting: In this framework, the freshness of each attribute is a critical factor. Attributes that are more recent are given higher weight, reflecting their relevance and accuracy in a dynamic data environment. This means that older data points may be deprioritized in favor of newer entries, which can provide a more accurate representation of the current situation.
Golden key assignment: A key aspect of data management is the generation of a persistent and unique identifier known as the Golden Record ID (GR_ID). This unique ID serves as a definitive reference for each record, ensuring that it can be consistently identified and tracked across various systems and datasets. The GR_ID plays a crucial role in maintaining data integrity and facilitating effective data management practices.

These jobs output a consolidated Golden Record table, enriched with metadata fields such as:

Source count
Last contributing source
Survivorship rule applied
Confidence score

Identity graph and lineage

OOTB identity graph jobs build a linkage view of all contributing records:

Identity link tables: These tables are essential for mapping all source keys, such as Email, CustomerID, and DeviceID, to a unique identifier known as GR_ID. By establishing this mapping, we can ensure that each source key is accurately linked to its corresponding GR_ID, facilitating better data management and retrieval.
Lineage tables: The purpose of these tables is to maintain a comprehensive record of before-and-after match relationships, which is crucial for auditability. By storing this information, we can track changes over time, providing transparency and accountability in our data processes. This allows us to understand the evolution of data and ensures that we can trace back any modifications made, which is vital for compliance and data integrity.
Incremental updates: To enhance efficiency, we implement jobs that run on a scheduled basis to refresh only the delta records. This means that rather than processing the entire dataset, we focus on updating only the records that have changed since the last update. This approach not only saves time and computational resources but also ensures that our data remains current and relevant without unnecessary overhead. By adopting incremental updates, we can maintain a dynamic and responsive data environment that meets the needs of our users effectively.

Job orchestration

OOTB jobs are deployed via the Redpoint CDP job orchestration framework, typically running in the following sequence:

Source_Load
Standardize_Data
Match_Cluster
Survivorship_CreateGR
Build_IdentityGraph
Export_GoldenRecord

The orchestration can be configured for incremental, full refresh, or test mode within RPDM.

Outputs

Key data assets generated by the OOTB profile unification jobs include:

Asset	Description	Example Table
Golden Record Table	Unified customer record with survivorship fields	`CDP_GoldenRecord`
Identity Graph	Links between source records and `GR_ID`	`CDP_IdentityGraph`
Lineage Table	Audit trail of merged source records	`CDP_Lineage`
Survivorship Rules	Rule table defining precedence	`CDP_Survivorship_Rules`
Error Tables	Records rejected during standardization	`CDP_Error_Records`

Extension and customization

While OOTB jobs provide a complete foundation, they can be extended as needed:

Add custom match rules that allow for the inclusion of specific identifiers such as social media handles and loyalty program IDs. This enhancement will enable more precise data matching and improve the overall accuracy of data integration processes.
Enhance the survivorship logic to accommodate brand-specific data preferences. By refining this logic, we can ensure that the most relevant and preferred data is retained, thereby improving the quality of the information used for decision-making.
Integrate external cleansing APIs such as Melissa and Experian. Utilizing these APIs will facilitate the cleaning and validation of our data, ensuring that it meets the highest standards of accuracy and reliability.
Include custom lineage fields to support compliance with GDPR and CCPA regulations. This addition will provide necessary tracking and documentation of data usage, helping to maintain transparency and uphold user privacy rights.
Create incremental delta jobs that utilize timestamp columns. This approach will allow for efficient data processing by only updating records that have changed since the last job run, thus optimizing performance and reducing resource consumption.

Downstream activation

The unified Golden Record can then be:

Published to Redpoint Interaction, enabling effective campaign segmentation and personalization. This platform allows for targeted marketing efforts, ensuring that messages reach the right audience with the right content, ultimately enhancing engagement and conversion rates.
Synced to various data warehouses, including Snowflake, Databricks, and BigQuery. This synchronization facilitates robust data management and analysis, providing organizations with the tools necessary to derive insights from their data, optimize operations, and make informed decisions based on comprehensive analytics.
Exposed to downstream systems through various methods such as APIs, flat files, or message queues. This integration ensures that different systems can access and utilize the data seamlessly, promoting interoperability and enhancing the overall efficiency of data workflows across the organization. By leveraging these technologies, businesses can ensure that their data is not only accessible but also actionable, leading to improved operational effectiveness and strategic outcomes.

Governance and monitoring

OOTB CDP jobs include built-in monitoring capabilities:

Job status logs (success/failure)
Record count comparisons (before/after match)
Exception reporting for data quality failures
Historical tracking of GR_ID evolution

Summary

Redpoint’s out-of-the-box profile unification implementation offers a highly configurable, transparent, and enterprise-grade framework designed to streamline the creation of the Golden Record, without the need for custom development. This robust solution forms the foundation of Redpoint’s data orchestration philosophy, empowering organizations to unify disparate data sources efficiently and reliably. By leveraging this framework, businesses can establish trusted, actionable customer profiles that drive more effective personalization strategies and deliver deeper analytics insights. Ultimately, Redpoint’s profile unification not only accelerates time-to-value but also ensures that customer data remains accurate, consistent, and ready to support a wide range of business objectives across the enterprise.