Glossary

Data Modeling for Customer Data Platforms

Data modeling defines the schema and relationships that structure customer data inside a CDP — from identity graphs to behavioral events — enabling unified profiles and real-time activation.

CDP.com Staff CDP.com Staff 6 min read

Data modeling is the practice of defining how customer data is structured, related, and stored so that a CDP can unify profiles, resolve identities, and activate audiences across channels.

Without a deliberate data model, customer data becomes a disorganized pile of tables and events that no system — human or AI — can reliably act on. A well-designed data model turns raw ingestion streams into a coherent graph of customers, their attributes, their behaviors, and their relationships to products, accounts, and channels.

Why Data Modeling Matters for CDPs

CDPs ingest data from dozens of sources: websites, mobile apps, CRMs, point-of-sale systems, call centers, and third-party enrichment providers. Each source has its own schema, naming conventions, and granularity. Data modeling is the layer that reconciles these differences.

A strong data model enables:

  • Identity resolution — Linking anonymous cookies, email addresses, phone numbers, and device IDs to a single customer profile requires a well-defined identity graph schema.
  • Consistent segmentation — Marketers can build audiences using standardized attributes (e.g., lifetime_value, last_purchase_date) rather than hunting through raw tables.
  • Real-time activation — Streaming data pipelines need pre-defined schemas to process events as they arrive, not after batch transformation.
  • AI and machine learning — Predictive models require structured feature sets. A messy data model produces unreliable predictions.

Core Components of a CDP Data Model

1. Customer Entity (Profile)

The central entity. Every CDP revolves around a unified customer profile that consolidates attributes from all sources.

Attribute TypeExamples
Identifiersemail, phone, cookie ID, device ID, loyalty ID
Demographicsname, age, gender, location, language
Computed metricslifetime value, churn score, engagement score
Consent & preferencesopt-in status, channel preferences, privacy flags

2. Event (Behavioral) Data

Time-stamped actions that describe what a customer did. Events are the foundation for behavioral segmentation and journey orchestration.

Common event categories:

  • Digital engagement — page views, clicks, searches, video plays
  • Transactions — purchases, returns, subscriptions, renewals
  • Communication — email opens, push notification taps, SMS replies
  • Service — support tickets, chat sessions, NPS responses

3. Relationships and Hierarchies

Real-world customer data is rarely flat. A CDP data model must handle:

  • Account-to-contact — B2B scenarios where multiple people belong to one company
  • Household — B2C scenarios where family members share an address or loyalty account
  • Product catalog — Linking purchase events to product attributes (category, SKU, margin)

4. Identity Graph

The identity graph is a specialized data structure that maps all known identifiers for a single person. It supports both deterministic matching (same email across sources) and probabilistic matching (behavioral signals suggesting two profiles are the same person). See identity resolution for a deeper explanation.

Schema Approaches: Fixed vs. Flexible

CDPs take different approaches to schema design:

ApproachHow It WorksTrade-off
Fixed schemaThe CDP defines a standard set of tables and fields. Data must conform on ingestion.Faster queries, stronger governance, but less flexibility for unusual data.
Flexible schemaThe CDP accepts arbitrary key-value attributes and nested objects. Schema evolves at ingestion time.Maximum flexibility, but harder to enforce consistency and optimize performance.
Hybrid schemaCore entities (profiles, events) have fixed schemas. Custom attributes extend them without altering the base model.Balances governance with adaptability — the approach most modern CDPs use.

Hybrid CDPs typically enforce a canonical profile schema while allowing teams to attach custom attributes and event types without a migration.

Data Modeling in the AI Era

AI is changing how CDPs use data models in two important ways:

  1. Feature stores replace manual modeling — Instead of analysts hand-crafting every computed attribute, AI systems automatically generate features (e.g., purchase frequency trends, channel affinity scores) from raw event data. The data model must support this by providing clean, well-typed event streams.

  2. AI agents need structured access — As CDPs evolve from human-queried tools to foundations for AI agents, the data model becomes an API contract. An agent deciding whether to send a discount offer needs to read churn_score, last_purchase_date, and lifetime_value from a predictable schema — not parse unstructured JSON blobs.

A well-modeled CDP becomes the structured memory layer that AI agents rely on for real-time customer decisioning.

Data Modeling vs. Data Warehousing

Data modeling in a CDP differs from traditional data warehouse modeling (star schemas, snowflake schemas) in several ways:

DimensionData WarehouseCDP
Primary entityBusiness transactions (facts)Customer profiles
Optimization goalAnalytical query performanceReal-time profile lookup and segmentation
Schema evolutionSlow, managed by data engineeringContinuous, driven by new data sources
IdentityAssumed (single key per table)Must be resolved across sources
LatencyBatch (hourly/daily)Streaming and batch hybrid

Composable CDPs that run entirely on the data warehouse inherit its modeling paradigm — optimized for analytics but not always for the real-time profile access that activation demands. Hybrid CDPs maintain their own optimized customer data store alongside warehouse connectivity.

Best Practices

  1. Start with the customer entity — Define your canonical profile schema first. Everything else references it.
  2. Standardize event naming — Use a consistent taxonomy (e.g., product_viewed, order_completed) across all sources. Inconsistent event names are the most common data modeling mistake.
  3. Design for identity resolution — Include all known identifiers in your schema from day one. Retrofitting identity fields is painful.
  4. Separate raw from modeled — Keep raw ingestion data intact. Apply transformations into modeled tables so you can always reprocess.
  5. Version your schema — Track schema changes so downstream consumers (segments, models, activations) can adapt.

FAQ

What is the difference between data modeling and data mapping?

Data modeling defines the target schema — what entities exist, what attributes they have, and how they relate to each other. Data mapping is the process of connecting source fields to that target schema. You model first, then map. In a CDP context, modeling determines that a unified profile has fields like email and lifetime_value, while mapping determines that Salesforce’s Contact.Email and Shopify’s customer.email both feed into that email field.

Do I need a data engineer to set up data modeling in a CDP?

It depends on the CDP. Composable CDPs built on data warehouses typically require data engineering skills to define and maintain dbt models, SQL transformations, and identity resolution logic. Hybrid CDPs with built-in data modeling tools let marketing technologists configure schemas through visual interfaces, though complex custom models still benefit from engineering support.

How does data modeling affect segmentation accuracy?

Directly. If your data model inconsistently tracks purchase events — some sources record order_total as cents, others as dollars — segments based on purchase value will be wrong. A well-modeled CDP normalizes these differences at ingestion, ensuring that every segment query operates on clean, consistent data. Poor data modeling is the root cause of most segmentation errors.

  • Identity Resolution — Relies on well-defined identity graph schemas within the data model
  • Data Integration — Connects source systems whose data the model must reconcile
  • Data Activation — Consumes modeled profiles and segments for downstream delivery
  • Composable CDP — Requires manual data modeling via SQL and dbt transformations
  • Data Pipeline — Moves raw data into the modeled schema for processing
  • Data Governance — Enforces consistency and quality standards across data models
CDP.com Staff
Written by
CDP.com Staff

The CDP.com staff has collaborated to deliver the latest information and insights on the customer data platform industry.