Glossary

Feature Store

A feature store is a centralized repository for storing, serving, and reusing ML-ready features. Learn how CDPs feed feature stores for real-time AI decisioning.

CDP.com Staff CDP.com Staff 6 min read

A feature store is a centralized platform for defining, storing, and serving machine learning features — pre-computed, reusable data attributes such as “average order value over 30 days” or “email engagement score” — that ensures consistency between model training and real-time inference.

In machine learning, a feature is any measurable input to a model: a customer’s purchase frequency, the time since their last login, their preferred channel, or a derived metric like lifetime value decile. Feature engineering — the process of transforming raw data into these model-ready inputs — often accounts for 60-80% of ML development time. Feature stores solve this by centralizing feature definitions so that multiple teams and models can reuse the same computed attributes without duplicating effort.

Pioneered by Uber (Michelangelo, 2017) and later adopted by platforms like Feast, Tecton, and Databricks, feature stores have become a standard component in production ML architectures. For marketing AI, they provide the bridge between raw customer data in a Customer Data Platform (CDP) and the models that drive AI personalization and AI decisioning.

How Feature Stores Work

1. Feature Definition and Registration

Data scientists define features using declarative specifications: the source data, the transformation logic, the aggregation window, and the freshness requirements. For example, a feature called avg_order_value_30d specifies: source = orders table, transformation = mean(order_total), window = 30 days, freshness = updated hourly. These definitions are registered in a central catalog so other teams can discover and reuse them.

2. Batch and Real-Time Computation

Feature stores compute features through two pathways. Batch pipelines process historical data at scheduled intervals (hourly, daily) and store results in an offline store for model training. Streaming pipelines process events in real time and update an online store for low-latency inference. Both pathways use the same feature definition, eliminating training-serving skew — the dangerous mismatch that occurs when a model is trained on features computed differently than those served in production.

3. Storage and Serving

The offline store (typically a data warehouse or data lake) holds historical feature values for training datasets. The online store (Redis, DynamoDB, or a purpose-built key-value store) serves the latest feature values with single-digit millisecond latency for real-time model inference. When an AI agent needs to decide the next best action for a customer, it queries the online store for that customer’s current features.

4. Feature Discovery and Reuse

A feature catalog allows data scientists across the organization to search for existing features before building new ones. If one team has already computed email_open_rate_7d, another team building a churn model can reuse it directly. This reduces duplicated effort and ensures consistency across models.

The CDP Connection

A CDP is a primary data source for marketing-focused feature stores. The CDP’s identity-resolved customer profiles, behavioral event streams, and unified customer 360 records provide the raw material from which features are computed. Without a CDP feeding clean, deduplicated customer data into the feature store, features would be computed on fragmented, inconsistent data — producing unreliable model inputs.

In some architectures, the CDP itself functions as a feature store for customer-centric features. CDPs that support computed attributes, real-time aggregations, and API-accessible profile fields effectively serve the same role as a feature store for marketing AI use cases.

Feature Store vs. Data Warehouse

DimensionFeature StoreData Warehouse
PurposeServe ML-ready features for training and inferenceStore and query structured data for analytics
LatencyOnline store: sub-10ms; offline store: batchTypically seconds to minutes
Primary ConsumerML models and AI agentsAnalysts and BI tools
SchemaFeature definitions with transformation logicTable schemas with raw and aggregated data
ConsistencyEnsures training-serving consistencyNo built-in training-serving alignment
Time TravelPoint-in-time feature retrieval for trainingHistorical queries via SQL

Practical Guidance

Feed features from your CDP. Route customer behavioral and transactional data through your CDP’s data pipeline into the feature store. The CDP handles identity resolution and data quality; the feature store handles transformation and serving. This separation of concerns keeps both systems focused on what they do best.

Eliminate training-serving skew. Use the same feature definitions for both batch training and real-time serving. Feature stores automate this, but only if both pathways are configured from the same registered definitions. Audit regularly to ensure consistency.

Start with high-impact features. Begin with features that multiple models need: recency, frequency, monetary value, channel preferences, engagement scores. These foundational customer features, typically derived from CDP data, provide immediate reuse value.

Monitor feature freshness. Stale features degrade model performance silently. Implement data observability monitors on feature computation pipelines to alert when features fall behind their freshness SLAs.

FAQ

What is the difference between a feature store and a database?

A database stores raw or aggregated data for general-purpose querying. A feature store is purpose-built for machine learning: it stores pre-computed, versioned feature values with both an offline store for model training and an online store for low-latency inference. The critical differentiator is training-serving consistency — a feature store guarantees that the features used to train a model are computed identically to those served in production, preventing skew that degrades model accuracy.

Do I need a feature store if I have a CDP?

It depends on your AI maturity. If your CDP supports computed attributes and real-time profile APIs, it may serve as a sufficient feature store for marketing AI use cases. However, if you run multiple ML models across different teams, need sub-millisecond serving latency, or require point-in-time feature retrieval for model training, a dedicated feature store adds capabilities that most CDPs do not natively provide.

How does a feature store improve marketing AI?

A feature store improves marketing AI in three ways. First, it ensures that models are trained and served with identically computed features, eliminating the training-serving skew that causes production models to underperform. Second, it enables feature reuse — a customer engagement score computed once can feed churn models, recommendation engines, and personalization systems simultaneously. Third, it provides low-latency feature serving so that real-time AI decisioning can access the freshest customer context in milliseconds.

  • Data Enrichment — Enhancing customer profiles with additional computed or third-party attributes
  • ETL and ELT — Data transformation patterns that feed feature computation pipelines
  • Data Ingestion — The process of collecting raw data from source systems into data infrastructure
  • Real-Time CDP — A CDP that processes and serves customer data with minimal latency
CDP.com Staff
Written by
CDP.com Staff

The CDP.com staff has collaborated to deliver the latest information and insights on the customer data platform industry.