On June 16, 2026, Databricks answered a question this site has covered since April: Is Databricks a CDP? The answer was “no” — until it wasn’t. At the Data + AI Summit, Databricks launched CustomerLake, a purpose-built Agentic CDP embedded natively in the data lakehouse. The platform that data engineers already trust for analytics and ML is now making a direct play for the customer engagement layer that CDPs have owned for a decade.
This guide covers what CustomerLake is, how it works, where it fits in the CDP category’s evolution, and what remains unproven.
What Is CustomerLake?
CustomerLake is Databricks’ first marketing product — an Agentic CDP that unifies customer data, AI models, identity resolution, audience building, and activation on the same lakehouse where organizations already run analytics and ML workloads. It is currently in Private Preview.
The core concept is what Databricks calls Infinity campaigns — a shift from static, one-off campaigns to continuous agentic loops where AI agents constantly analyze customer signals, decide next-best actions, and optimize engagement in real time. As CEO Ali Ghodsi framed it: “Marketing stops being a series of campaigns and becomes a continuous loop — agents that constantly analyze, decide, and act on every customer in real time.”
CustomerLake is built on three design principles:
- Embedded — built directly on existing Databricks data foundations. No data duplication into a separate CDP
- Democratized — marketers operate through agent-first interfaces on trusted data, reducing ad hoc requests to data teams
- Autonomous — agents analyze customer signals, recommend next-best actions, and optimize engagement around business goals at 1:1 scale
Profile Agents
Profile Agents handle the data engineering side of CDP functionality — transforming raw customer data into business-ready Customer 360 profiles directly inside Databricks.
What Profile Agents automate:
- Bronze-to-gold data transformation — raw event data is cleaned, enriched, and structured into unified customer profiles. The agent identifies data quality issues, recommends fixes, and supports third-party data enrichment
- Agentic Identity Resolution — a multi-method approach that combines deterministic matching, probabilistic matching, and agentic workflows (where the AI identifies edge cases and improves quality iteratively through continuous feedback loops). Teams can integrate their existing identity rules, models, and third-party enrichment alongside AIR
- Identity Provider marketplace — pre-built integrations with Acxiom, Epsilon, LiveRamp, TransUnion, and Adstra enable one-click access to third-party identity graphs for enrichment and resolution
For data engineers, Profile Agents reduce the manual pipeline work that typically consumes weeks when building customer 360 views from scratch on a lakehouse.
Campaign Agents
Campaign Agents handle the activation side — moving from audience creation to channel delivery with AI assistance at every step.
What Campaign Agents provide:
- Goal-driven campaign generation — agents use business goals and customer context to recommend audiences, messaging, timing, and channel selection
- Genie-powered interface — marketers interact through Databricks’ natural language interface (Genie) to build audiences, define campaign briefs, and set guardrails — without writing SQL
- Pre-launch simulation — campaigns can be simulated against a sample of customer profiles before going live, providing visibility into how decisions will play out
- Built-in guardrails — marketing opt-out enforcement, suppression for customers with unresolved support tickets, and frequency capping are native to the decisioning layer
Campaign Agents represent Databricks’ answer to the “marketer self-service” gap that has historically separated lakehouses from CDPs. Whether the agent-first interface achieves the usability of mature CDP journey builders remains to be validated at scale — CustomerLake is still in Private Preview.
Architecture
CustomerLake’s architecture reflects a fundamental bet: the data lakehouse should be the system of record for marketing, not a data source that feeds a separate CDP.
Key architectural components:
| Component | Role |
|---|---|
| Lakehouse + Unity Catalog | Foundation layer — all customer data stays governed in one place. No data copies to external CDP systems |
| Lakehouse Federation | Query customer data across Snowflake, BigQuery, cloud storage, and operational databases without copying it into Databricks |
| Real-Time Profile API | Sub-100ms profile lookups for in-session personalization and real-time decisioning |
| Genie | Natural language interface enabling marketers to build audiences and campaigns without SQL |
| Lakeflow | Data ingestion layer for streaming and batch customer data |
| Bi-directional reverse ETL | Native pipelines to marketing tools, advertising platforms, identity providers, and engagement channels |
Activation partners at launch include Adobe, Meta (Audience and Conversions API), The Trade Desk, Braze, Bloomreach, Iterable, Snapchat, Magnite, Twilio, IAS, and Unity.
Pricing model: CustomerLake uses a consumption-based model — the product layer itself carries no separate platform fee. Databricks monetizes through the underlying compute and storage. This mirrors the approach Databricks took with Lakewatch (its security product): ingest is free, compute is billed. As Martech Therapy’s Matthew Niederberger noted: “You cannot win a price war against a company that does not need your product to make money.”
CDP Evolution: Where CustomerLake Fits
The CDP category has evolved through three stages, each driven by pressure to close the Customer Intelligence Loop faster:
| Stage | Era | Architecture | Data model |
|---|---|---|---|
| Packaged CDP | 2016–2020 | All-in-one platform with proprietary data store | Data copied into the CDP |
| Composable CDP | 2020–2025 | Warehouse/lakehouse as source of truth + activation layer on top | Data stays in the warehouse; activation tools read via reverse ETL or zero-copy |
| Agentic CDP | 2024+ | AI agents embedded in the data platform, closing the loop autonomously | Data never leaves the governed platform |
CustomerLake is a Stage 3 entry — but it arrives from a different direction than most Agentic CDPs. Where traditional CDP vendors are adding AI agents to existing platforms, Databricks is adding CDP capabilities to an existing data platform. The result is architecturally similar but organizationally different: CustomerLake’s buyer is likely the data team (who already owns Databricks), not the marketing team (who typically owns CDP procurement).
CDP Capabilities: What CustomerLake Covers and What It Does Not
A customer data platform is defined by five core capabilities. Before CustomerLake, a Databricks lakehouse covered none of them natively. Here is where CustomerLake stands:
| Capability | What a CDP requires | What CustomerLake provides | Status |
|---|---|---|---|
| Identity resolution | ML-powered stitching of anonymous and known profiles | Agentic Identity Resolution — deterministic + probabilistic + agentic matching, plus identity provider marketplace | Addressed |
| Real-time profile serving | Sub-100ms API lookups for in-session personalization | Real-Time Profile API with sub-100ms lookups (claimed) | Addressed (unverified at scale) |
| Marketer self-service | Visual segmentation and journey building without SQL | Genie-powered natural language interface + Campaign Agents | Addressed (UX maturity TBD) |
| Customer Intelligence Loop | COLLECT → UNIFY → UNDERSTAND → DECIDE → ENGAGE in seconds | Infinity campaigns — agents close the loop continuously | Addressed in design (Private Preview) |
| Native messaging | None — required separate ESP | None — activation via partner integrations (Braze, Iterable, Twilio, etc.) | Not addressed |
The most significant shift is identity resolution: Agentic Identity Resolution’s multi-method approach (deterministic + probabilistic + agentic matching) with a built-in identity provider marketplace directly addresses what was previously Databricks’ weakest CDP gap. The most notable absence is native messaging — CustomerLake activates through partner channels, not built-in email, SMS, or push. This is a deliberate architectural choice: Databricks is building the intelligence and decisioning layer, not the delivery infrastructure.
Competitive Impact
CustomerLake’s entry reshapes competitive dynamics across the CDP landscape:
Composable CDP vendors (Hightouch, Census, GrowthLoop) face structural pressure. These companies built their value proposition on being the activation layer between the warehouse and marketing tools — the position CustomerLake now occupies natively. As Niederberger observes, composable vendors “convinced the market to pull everything into warehouses, creating the on-ramp” for warehouse-native CDPs. That said, composable vendors have mature integrations, established customer bases, and multi-warehouse support (Snowflake, BigQuery, Redshift) — switching costs are real, and CustomerLake’s Private Preview status limits near-term displacement.
Packaged CDP vendors (Segment, mParticle, Lytics) face a different challenge. CustomerLake does not replicate their full feature sets (particularly native messaging and event SDKs), but it undermines the data-copy model that packaged CDPs require. For organizations already on Databricks, the value proposition of copying customer data into a separate platform weakens when the lakehouse itself offers identity resolution, segmentation, and activation.
Enterprise suite CDPs (Adobe AEP, Salesforce Data Cloud) are less directly threatened but not immune. Their CDP capabilities are embedded within broader marketing clouds that include content management, journey orchestration, and cross-channel delivery — layers CustomerLake does not attempt to replace. However, CustomerLake’s Lakehouse Federation (which can query Snowflake, BigQuery, and cloud storage without copying data) challenges the data gravity that suite vendors rely on. Notably, Adobe is a CustomerLake launch partner — suggesting coexistence rather than displacement in the near term.
Adweek characterized CustomerLake as “an agentic challenger to traditional CDPs” — framing it as a category disruptor rather than a feature addition.
What Remains Unproven
CustomerLake is in Private Preview. Several critical questions will only be answered as the product matures:
- Marketer UX maturity — agent-first interfaces are novel. Whether Genie + Campaign Agents can match the usability of purpose-built CDP journey builders (which have had years of iteration with marketing teams) is the largest open question
- Event collection SDKs — CustomerLake’s ingestion relies on Lakeflow and existing Databricks pipelines. Whether it will offer client-side SDKs for web and mobile event collection (comparable to Segment’s analytics.js or mParticle’s SDK) has not been announced
- Real-time latency at scale — sub-100ms profile API lookups are claimed but not independently verified under production load
- General availability timeline — no GA date has been announced. Enterprise procurement cycles may stall on Private Preview status
- Team skill requirements — CustomerLake assumes a mature Databricks practice. Organizations without existing Databricks platform engineers may struggle to operationalize the product, even with agent-first interfaces
- Cost predictability — consumption-based pricing means costs scale with agent compute. When Campaign Agents run autonomously and continuously (Infinity campaigns), forecasting monthly spend becomes harder than with traditional per-seat or per-profile licensing
- Migration complexity — switching identity resolution providers or moving from an existing CDP’s identity graph to Agentic Identity Resolution is a multi-month effort. Mid-contract transitions are operationally risky
- Multi-cloud neutrality — CustomerLake runs on Databricks. Organizations with customer data split across Snowflake and Databricks can use Lakehouse Federation, but the control plane, governance, and billing flow through Databricks. As Niederberger notes, federation “still runs across Databricks’ control plane, its governance, its bill, and its roadmap”
FAQ
Is Databricks a CDP now?
Yes — with the launch of CustomerLake, Databricks is now a CDP, specifically an Agentic CDP. CustomerLake adds identity resolution, audience segmentation, campaign automation, and multi-channel activation natively on the lakehouse. The key distinction from traditional CDPs: CustomerLake embeds these capabilities in the data platform rather than operating as a separate system that ingests copies of customer data. It is currently in Private Preview.
How does CustomerLake compare to packaged CDPs?
CustomerLake addresses most core CDP capabilities but takes a fundamentally different architectural approach. Packaged CDPs copy customer data into a proprietary store, then build identity, segmentation, and activation on top. CustomerLake operates directly on the lakehouse — data stays in place, governed by Unity Catalog. It provides identity resolution (Agentic Identity Resolution), marketer self-service (Genie), and partner-based activation, but does not include native messaging (email, SMS, push). Organizations that need built-in message delivery will still require a messaging platform alongside CustomerLake.
What is an Infinity campaign?
An Infinity campaign is Databricks’ term for a continuously running, agent-driven engagement loop that replaces static one-off campaigns. Instead of the traditional workflow — define objective, request data, build segment, launch, measure — Campaign Agents continuously analyze customer signals, recommend next-best actions, and optimize engagement in real time. The “infinity” framing reflects the shift from discrete campaign cycles to always-on 1:1 personalization. The concept is central to CustomerLake’s value proposition but remains in Private Preview as of June 2026.
Should I replace my CDP with Databricks CustomerLake?
It depends on your current architecture and where your data lives. If your organization already runs Databricks as its primary data platform and uses a composable CDP stack (warehouse + reverse ETL + activation tools), CustomerLake may consolidate that stack into fewer moving parts. If you rely on a packaged CDP with native messaging, journey orchestration, and mature marketer UX, CustomerLake does not yet replicate those capabilities end-to-end. Evaluate based on your specific Customer Intelligence Loop requirements — particularly whether native messaging and client-side event SDKs are critical to your use cases. For evaluation criteria, see How to Evaluate a CDP in the AI Era.
Sources: Databricks press release (June 16, 2026), Databricks blog, Adweek, Martech Therapy (Matthew Niederberger).