Series: Unified Pipeline – Experiences from Building a Production ML System
Series Goal:
To show how theoretical data science differs from production reality and why infrastructure, process, and governance are often more important than the model itself.
Planned Parts
- Why the Unified Pipeline Was Created in the First Place – a problem that couldn’t be solved with a better model
- From Experiments to a System – architectural principles and decisions
- Time as the Enemy of the Model – time-aware validation, stability, and the reality of operations
- MLOps Without the Buzzwords – what actually increased speed and quality
- What I Would Do Differently Today – lessons learned, dead ends, and transferable principles
Part 1: Why the Unified Pipeline Was Created in the First Place
When a Better Model Isn’t Enough
At a certain stage in data science work, one reaches a point where further model improvements no longer provide corresponding value.
Not because the models are "good enough," but because the problem is no longer statistical.
It was at this exact point that the idea for the Unified Pipeline was born.
At first glance, everything was fine:
- predictive models existed,
- the results were not bad,
- the data was available.
Yet, development was slow, changes were risky, and knowledge transfer was difficult. Every new use-case meant:
- re-solving data preparation,
- re-solving validation,
- re-solving deployment,
- and often, re-discovering the same mistakes.
This is not a failure of people.
This is a failure of the work architecture.
The Hidden Debt: Fragmentation
The fundamental problem was not in the individual models, but in the fact that:
- each was created slightly differently,
- had a different validation approach,
- handled time differently,
- was deployed differently.
The result was fragmentation:
- fragmentation of code,
- fragmentation of responsibility,
- fragmentation of knowledge.
And most importantly: no change was cheap.
One Pipeline ≠ One Model
The Unified Pipeline was not an attempt to create "one universal model."
It was an effort to create one universal way of thinking about how models are built, tested, and operated.
The basic idea was simple:
If two models solve a different problem, but run at the same time, on the same data, and in the same production environment,
they should share the maximum amount of infrastructure and the minimum amount of variability.
In other words:
variability should be explicit,
not hidden in ad-hoc scripts.
Speed as a Consequence, Not a Goal
There is often talk of "speeding up development."
But the Unified Pipeline was not created to be fast.
It was created to be:
- predictable,
- auditable,
- repeatable.
Speed came as a consequence:
- less ad-hoc decision making,
- less re-inventing the wheel,
- fewer "heroic" interventions.
And this is what made it possible to:
- deploy new models significantly faster,
- test more variants without chaos,
- and focus more on the purpose of the model than on its surroundings.
Why "Unified"
The word Unified was not for marketing.
It was chosen intentionally.
The Pipeline unified:
- the way of working with time,
- the method of validation,
- the versioning method,
- the deployment method,
- and even the way of thinking about models.
And that is perhaps its greatest contribution:
it unified the team’s mental model, not just the code.
What’s Next
In the next part, I will look at:
- why it was necessary to abandon a purely experimental approach,
- which architectural decisions were key,
- and where it turned out that "best practices from blogs" often don’t work in real operation.
Napsat komentář