Glossary

Identity Graph

An identity graph maps all customer identifiers — emails, device IDs, cookies — to a single unified profile. Learn how CDPs build and maintain identity graphs.

CDP.com Staff CDP.com Staff 8 min read

An identity graph is a data structure that links all known identifiers for a single person — email addresses, phone numbers, device IDs, cookie IDs, loyalty numbers, and CRM records — into one unified profile. It serves as the foundational map that connects anonymous browsing behavior to known customer identities, enabling organizations to recognize the same individual across channels, devices, and touchpoints. The identity graph is the core data structure that powers identity resolution. In customer data platforms, the identity graph is the engine behind customer data unification and the basis for building a true customer 360 view.

How Identity Graphs Work

An identity graph continuously ingests identifiers from multiple sources and evaluates whether they belong to an existing person or represent a new individual.

Identifier collection is the starting point. Every customer interaction generates identifiers: a website visit produces a cookie ID and potentially a device fingerprint, a purchase generates a transaction ID linked to an email address, a mobile app session creates a device ID, and a support call logs a phone number. The identity graph collects these identifiers from every available source and stores them as nodes in a graph structure.

Link creation connects identifiers that belong to the same person. When a customer logs into a website from their phone, the identity graph can link the mobile device ID to the email address used for authentication. When that same customer later makes a purchase using the same email, the transaction data is linked to the existing profile. Over time, the graph accumulates a rich web of connections between identifiers, building a comprehensive view of each individual.

Profile unification merges the data associated with linked identifiers into a single customer profile. Once the graph determines that cookie ABC, email user@example.com, and device ID XYZ all represent the same person, all behavioral data — browsing history, purchase records, email engagement, app usage — is consolidated into one unified profile.

Graph maintenance keeps the identity graph accurate over time. This includes handling identifier changes (new email addresses, new devices), resolving conflicts (two profiles that were incorrectly merged), managing identifier expiration (cookies that expire, devices that are sold), and incorporating new data sources as they become available.

Deterministic vs Probabilistic Matching

Identity graphs use two fundamental approaches to determine whether different identifiers belong to the same person.

Deterministic matching links identifiers based on exact, known connections. When a customer logs in with email user@example.com on both their laptop and phone, the graph can definitively link those device IDs to that email address. Deterministic matches are high-confidence connections based on authenticated events — logins, form submissions, purchases with verified contact information. The accuracy rate approaches 100%, but deterministic matching alone cannot connect anonymous identifiers to known profiles until an authentication event occurs.

Probabilistic matching uses statistical models and machine learning to infer connections between identifiers when no definitive link exists. These models analyze signals like IP addresses, browser configurations, location patterns, behavioral similarities, and timing correlations to estimate the probability that two identifiers belong to the same person. For example, if a laptop and a phone consistently access the same accounts from the same IP address at the same times of day, a probabilistic model might assign a high confidence score to the link — even without a shared login event.

Most production identity graphs combine both approaches. Deterministic matches form the high-confidence backbone of the graph, while probabilistic matching extends reach by connecting identifiers that lack direct authenticated links. The key is maintaining clear confidence scores so downstream systems can make appropriate decisions based on match quality.

Identity Graphs and First-Party Data

The deprecation of third-party cookies and tightening privacy regulations have elevated the importance of first-party data in building identity graphs. Organizations that can encourage authenticated interactions — account creation, loyalty program enrollment, app downloads — generate the deterministic signals that produce the highest-quality identity graphs. Collecting zero-party data through preference centers and surveys further strengthens identity graphs with voluntarily shared customer attributes.

First-party identity graphs built on consented, authenticated data are more durable, more accurate, and more compliant than graphs relying on third-party data or cross-site tracking. They also create a competitive advantage: organizations with richer first-party identity graphs can deliver more personalized experiences while respecting consumer privacy expectations.

Protecting personally identifiable information (PII) within identity graphs is critical. Every identifier linkage potentially creates a more complete profile of an individual, making identity graphs high-value targets for data breaches and subject to strict regulatory requirements under GDPR, CCPA, and other privacy frameworks.

How CDPs Build Identity Graphs

Customer data platforms use identity graphs as a core architectural component, and the sophistication of a CDP’s identity graph directly determines the quality of its customer profiles.

Real-time graph updates allow CDPs to incorporate new identifiers and connections as events occur. When a customer clicks an email link and lands on the website, the CDP can immediately link the email engagement to the subsequent browsing session — updating the profile in real time rather than waiting for a batch processing cycle.

Cross-channel stitching connects identifiers across marketing channels, customer service systems, point-of-sale terminals, mobile apps, and web properties. A CDP’s identity graph might link a customer’s in-store loyalty card, online account, mobile app profile, and email address into a single view — enabling consistent personalization regardless of which channel the customer uses.

Conflict resolution handles situations where the graph suggests two distinct profiles should be merged, or where a previously merged profile should be split. For example, if two family members share a device, the graph must distinguish between them rather than incorrectly merging their profiles. Advanced CDPs use machine learning to evaluate merge confidence and flag ambiguous cases for review.

Scalability is essential for enterprise identity graphs that may contain billions of identifier nodes and trillions of edges. CDPs must maintain sub-second query performance against these massive graph structures to support real-time personalization and activation use cases.

Identity Graphs in the AI Era

As AI-driven personalization and agentic marketing become central to customer engagement, identity graphs take on new importance. AI agents need accurate, real-time identity graphs to understand who they are interacting with, what that person has done across all channels, and how to personalize the next interaction.

An incomplete or inaccurate identity graph means the AI agent sees a fragmented view of the customer — leading to irrelevant recommendations, redundant outreach, or missed opportunities. The quality of the identity graph directly constrains the quality of AI-driven customer experiences.

For CDPs that maintain identity graphs within the same platform as their activation and AI decisioning layers, the graph can be queried and updated in real time as AI agents interact with customers — creating the closed feedback loops that enable continuous learning and optimization.

FAQ

What is the difference between an identity graph and identity resolution?

An identity graph is the data structure — the map of identifiers and their connections that represents each customer’s unified profile. Identity resolution is the process of building and maintaining that graph. Identity resolution encompasses the matching algorithms (deterministic and probabilistic), the rules for merging and splitting profiles, and the ongoing maintenance of identifier linkages. Think of the identity graph as the product and identity resolution as the process that creates and updates it.

What is the difference between deterministic and probabilistic identity matching?

Deterministic matching links identifiers based on exact, verified connections — such as the same email address used to log in on two different devices. It offers near-100% accuracy but limited reach, since it requires an authenticated event to create a link. Probabilistic matching uses statistical models to infer connections based on signals like IP address, device characteristics, location, and behavioral patterns. It extends reach significantly but introduces uncertainty, with confidence scores typically ranging from 60% to 95%. Most identity graphs combine both approaches, using deterministic links as the high-confidence backbone and probabilistic links to extend coverage.

How do CDPs build and maintain identity graphs?

CDPs build identity graphs by continuously ingesting identifiers from all connected data sources — websites, mobile apps, CRM systems, email platforms, point-of-sale systems, and more. Each new identifier is evaluated against the existing graph using deterministic and probabilistic matching to determine whether it belongs to an existing profile or represents a new individual. The CDP then links connected identifiers, merges associated data into unified profiles, and resolves conflicts when ambiguous matches arise. This process runs continuously, updating the graph in real time as new events occur and maintaining accuracy through ongoing deduplication, conflict resolution, and identifier expiration management.

  • Entity Resolution — The broader matching discipline that identity graphs apply specifically to customer identifiers
  • Golden Record — The single authoritative profile created when an identity graph merges all matched records
  • Data Governance — Policies and controls that ensure identity graph data is accurate, compliant, and properly managed
  • Real-Time CDP — CDP architecture that updates identity graphs in milliseconds as new events arrive
  • Consent Management — Systems that track user consent, determining which identifiers can be linked within the graph
CDP.com Staff
Written by
CDP.com Staff

The CDP.com staff has collaborated to deliver the latest information and insights on the customer data platform industry.