Unified Pipeline

Unified Pipeline is a case study on the design and implementation of a unified machine learning pipeline in a banking environment. The goal was to accelerate model deployment, increase model stability over time, and create a single experimental framework across teams and use cases.

Project Overview

The Challenge

The Solution

Key Features

The Results

Technology Stack

Unified Pipeline Series

Project Overview

Design and implementation of a unified machine learning (ML) pipeline for a banking institution, aimed at accelerating model deployment, increasing out-of-time stability (robustness outside the training period), and standardizing the experimental workflow across teams. The pipeline enables consistent development, validation, and deployment of models for more than 100 products and their variants.

The Challenge

The client faced a fragmented environment for ML model development:

Inconsistent approaches between teams (inconsistent workflows)
Long model deployment times (typically 7–10+ days)
Overestimation of model quality due to improper validation (data leakage, random splits)
Low model stability over time (poor performance on out-of-time data)
Limited reproducibility of experiments and a weak audit trail

The Solution

Architecture

Databricks as the main execution platform (distributed processing, orchestration)
MLflow for experiment tracking and model registry
Optuna for hyperparameter optimization with a focus on efficient search strategies
Spark (PySpark DataFrames) for scalable feature processing
Migration from the original Hadoop-based solution to a more modern Databricks-based architecture

Key Design Principles

Time-aware validation (a key differentiator from common practice)
- Use of walk-forward validation instead of random K-Fold cross-validation.
- Simulation of real-world model deployment over time.
- **Elimination of *data leakage***.
- Significant reduction in the gap between training and production performance.
Unified training framework
- A single pipeline for both classification and regression.
- Shared data preprocessing and feature engineering steps.
- Parameterization of the pipeline for different use-cases.
Advanced hyperparameter tuning (Optuna)
- Combination of Bayesian optimization and QMCSampler (Sobol sequences) for better coverage of the search space.
- Optimization with respect to time-stability metrics, not just in-sample performance.
- Managing the trade-off between performance and training time.
Model stability over raw performance
- Optimization for metrics such as Lift (within the top decile), F1 score, and stability of R² / accuracy over time.
- Emphasis on robustness across time periods.

Key Features

Automated feature preparation pipeline
- Scalable data transformation using Spark.
- Data quality checks, for example min/max validation instead of expensive operations such as countDistinct.
Centralized experiment tracking (MLflow)
- Complete audit trail of experiments, model versioning, and associated parameters.
Model registry and standardized deployment
- A unified interface for model deployment and support for rapid rollout of new versions.
Framework for intertemporal evaluation
- Systematic evaluation of models over time and identification of degradation patterns.
Flexible pipeline orchestration
- Support for multiple model types within a single pipeline through a modular design.

The Results

Metric	Before	After	Improvement
Model deployment time	7–10+ days	1–2 days	-6 to -8 days
Model stability (OOT)	Low	Significantly higher	Major improvement
Model lift (top decile)	Baseline	+5 to +30 %	+5–30 %
Pipeline execution time	~4 hours	~2–3 hours	-30 to -40 %

Technology Stack

Python, PySpark (Spark DataFrames)
Databricks (distributed computing, orchestration)
MLflow (experiment tracking and model registry)
Optuna (hyperparameter optimization)
CatBoost (classifier and regressor)