Štítek: series

  • Unified Pipeline – Part 1: Why the Unified Pipeline Was Created

    Series: Unified Pipeline – Experiences from Building a Production ML System

    Series Goal:
    To show how theoretical data science differs from production reality and why infrastructure, process, and governance are often more important than the model itself.


    Planned Parts

    1. Why the Unified Pipeline Was Created in the First Place – a problem that couldn’t be solved with a better model
    2. From Experiments to a System – architectural principles and decisions
    3. Time as the Enemy of the Model – time-aware validation, stability, and the reality of operations
    4. MLOps Without the Buzzwords – what actually increased speed and quality
    5. What I Would Do Differently Today – lessons learned, dead ends, and transferable principles

    Part 1: Why the Unified Pipeline Was Created in the First Place

    When a Better Model Isn’t Enough

    At a certain stage in data science work, one reaches a point where further model improvements no longer provide corresponding value.
    Not because the models are "good enough," but because the problem is no longer statistical.

    It was at this exact point that the idea for the Unified Pipeline was born.

    At first glance, everything was fine:

    • predictive models existed,
    • the results were not bad,
    • the data was available.

    Yet, development was slow, changes were risky, and knowledge transfer was difficult. Every new use-case meant:

    • re-solving data preparation,
    • re-solving validation,
    • re-solving deployment,
    • and often, re-discovering the same mistakes.

    This is not a failure of people.
    This is a failure of the work architecture.


    The Hidden Debt: Fragmentation

    The fundamental problem was not in the individual models, but in the fact that:

    • each was created slightly differently,
    • had a different validation approach,
    • handled time differently,
    • was deployed differently.

    The result was fragmentation:

    • fragmentation of code,
    • fragmentation of responsibility,
    • fragmentation of knowledge.

    And most importantly: no change was cheap.


    One Pipeline ≠ One Model

    The Unified Pipeline was not an attempt to create "one universal model."
    It was an effort to create one universal way of thinking about how models are built, tested, and operated.

    The basic idea was simple:

    If two models solve a different problem, but run at the same time, on the same data, and in the same production environment,
    they should share the maximum amount of infrastructure and the minimum amount of variability.

    In other words:

    variability should be explicit,
    not hidden in ad-hoc scripts.


    Speed as a Consequence, Not a Goal

    There is often talk of "speeding up development."
    But the Unified Pipeline was not created to be fast.

    It was created to be:

    • predictable,
    • auditable,
    • repeatable.

    Speed came as a consequence:

    • less ad-hoc decision making,
    • less re-inventing the wheel,
    • fewer "heroic" interventions.

    And this is what made it possible to:

    • deploy new models significantly faster,
    • test more variants without chaos,
    • and focus more on the purpose of the model than on its surroundings.

    Why "Unified"

    The word Unified was not for marketing.
    It was chosen intentionally.

    The Pipeline unified:

    • the way of working with time,
    • the method of validation,
    • the versioning method,
    • the deployment method,
    • and even the way of thinking about models.

    And that is perhaps its greatest contribution:
    it unified the team’s mental model, not just the code.


    What’s Next

    In the next part, I will look at:

    • why it was necessary to abandon a purely experimental approach,
    • which architectural decisions were key,
    • and where it turned out that "best practices from blogs" often don’t work in real operation.
  • Unified Pipeline – Part 2: From Experiments to a System

    Part 2: From Experiments to a System

    An Experiment is a Great Servant, but a Bad Master

    Most data science teams start correctly:
    rapid experiments, notebooks, iterations, searching for a signal in the data.

    The problem arises when:

    • an experiment outlives its purpose,
    • and gradually becomes production.

    A notebook that was supposed to answer the question "does this make sense?"
    quietly transforms into:

    • a source of truth,
    • a reference implementation,
    • and eventually, a critical dependency.

    The Unified Pipeline was created at the moment when it became clear that:

    The experimental approach was already holding back the system as a whole.

    Not because the experiments were bad.
    But because they are not meant to bear long-term responsibility.


    The Often Overlooked Transition Point

    There is a moment when a team should consciously ask:

    "Is this model still an experiment, or is it a system now?"

    This transition point is often ignored because:

    • the model "works,"
    • the metric looks good,
    • the business is satisfied.

    But it is at this moment that technical and methodological debt begins to accumulate:

    • unclear validation logic,
    • implicit assumptions about the data,
    • fragile deployment,
    • knowledge locked in the minds of individuals.

    The Unified Pipeline was a reaction to this silent transition into production without a change in mindset.


    Architecture as a Tool of Discipline

    One of the key decisions was to understand architecture not as:

    "a technical solution"

    but as:

    a tool for enforcing the right decisions.

    The Pipeline was designed so that:

    • validation could not be easily bypassed,
    • training could not be done without a clear time context,
    • a model could not be deployed without versioning and metadata.

    Not because the team was incapable of discipline.
    But because the system should be stronger than individual will.


    Configuration Instead of Improvisation

    A fundamental shift occurred when:

    decision-making moved from code to configuration.

    This had several consequences:

    • the differences between models were explicit,
    • the pipeline was readable even without being run,
    • and it was possible to compare models systematically, not based on feelings.

    Instead of the question:

    "What does this script actually do?"

    the team could ask:

    "What type of decision does this model represent?"

    And that is a huge difference.


    Time as a First-Class Problem

    One of the strongest architectural decisions was:

    to treat time as the central axis of the entire system.

    Not as a detail of validation, but as:

    the basic structure of the pipeline.

    This meant that:

    • every training had a clear time context,
    • validation respected the reality of deployment,
    • and the results were interpretable even in retrospect.

    The Unified Pipeline thus stopped optimizing for "statistical truth"
    and began to optimize for decision-making in time.


    From "the Best Model" to "the Best Process"

    Perhaps the most important change was mental:

    The goal was no longer to have the best model.
    The goal was to have the best process that consistently creates good models.

    This meant:

    • fewer heroic solutions,
    • more reproducible procedures,
    • less dependence on individuals,
    • more shared understanding.

    The Unified Pipeline thus became more of a:

    production philosophy
    than just a technical artifact.


    What’s Next

    In the next part, I will focus on a topic that is often underestimated yet crucial:

    the temporal stability of models
    – why standard cross-validation fails,
    – how "a good model today" differs from "a good model in six months,"
    – and why time is often more important than feature engineering.

  • Unified Pipeline – Part 3: Time as the Enemy of the Model

    Part 3: Time as the Enemy of the Model

    When Validation Lies Without Meaning To

    One of the most unpleasant experiences in applied data science is this:

    A model has great validation metrics –
    and yet it fails in production.

    Not dramatically.
    Not immediately.
    But systematically.

    The predictions are "somehow worse," stability fluctuates, and trust in the model gradually fades. And yet:

    • the pipeline is running,
    • the data is flowing,
    • the code hasn’t changed.

    The problem is not in the implementation.
    The problem is in time.


    The Illusion of Randomness

    Standard validation approaches implicitly assume that:

    • the data is randomly shuffled,
    • the distribution is stable,
    • the future is statistically similar to the past.

    These are reasonable assumptions for textbooks.
    But not for decision-making systems running in time.

    As soon as a model:

    • influences real decisions,
    • works with human behavior,
    • reacts to external conditions,

    then time becomes an active player, not just an index.


    Why Random Data Splitting Fails

    When randomly splitting training and validation data:

    • the model sees future patterns,
    • it learns relationships that do not exist in real time,
    • and the metrics look better than reality.

    This is not a flaw in the methodology.
    It is a mismatch between the question and the tool.

    The question in production is:

    "How will the model behave on data that does not yet exist?"

    But random validation answers a different question:

    "How well does the model interpolate within a known distribution?"


    The Unified Pipeline and Time Discipline

    The Unified Pipeline placed time at the center of the entire process:

    • training,
    • validation,
    • and interpretation of results.

    Each model was:

    • placed in a specific time context,
    • tested on data that actually followed,
    • and evaluated not only by performance but also by its stability over time.

    Validation ceased to be a single number
    and became a time trajectory.


    Stability as a Quality Metric

    It gradually became clear that:

    • the highest validation metric is not necessarily the best choice,
    • a model with slightly worse performance but higher stability is often more valuable in production.

    This led to a shift in thinking:

    • from maximizing a point metric,
    • to evaluating the model’s behavior across periods.

    In other words:

    A model is not evaluated on how good it was,
    but on how reliable it tends to be.


    Time Reveals True Overfitting

    Overfitting is often understood as:

    • a model that is too complex,
    • too many parameters,
    • too little regularization.

    But time reveals a different type of overfitting:

    the model is perfectly adapted to the past world,
    but fragile to change.

    The Unified Pipeline, therefore, did not just address:

    whether the model is overfit,

    but mainly:

    what it is overfit to.


    The Unpleasant Truth

    One of the most important findings was this:

    If a model cannot fail predictably,
    it cannot be trustworthy.

    Time-aware validation often:

    • lowered metrics,
    • complicated comparisons,
    • and forced the team to make unpleasant decisions.

    But it was precisely because of this that:

    • false certainty disappeared,
    • and trust in what the model can actually do grew.

    What’s Next

    In the next part, I will move from methodology to practice:

    MLOps without the buzzwords
    – what actually accelerated development,
    – what, on the other hand, added complexity without value,
    – and why "the right infrastructure" often means fewer, not more, tools.

  • Unified Pipeline – Part 4: MLOps Without the Buzzwords

    Part 4: MLOps Without the Buzzwords

    When Tools Become the Goal

    At a certain stage of a project, MLOps starts to behave strangely:

    • tools multiply,
    • processes multiply,
    • but certainty and speed do not.

    Instead of the infrastructure simplifying the work of data scientists,
    it starts to require:

    • synchronization,
    • workarounds,
    • explanations,
    • and sometimes even manual interventions "to get it through."

    The Unified Pipeline was created with a conscious goal:

    MLOps should reduce cognitive load, not just shift it elsewhere.


    What We Considered a Real Benefit

    It gradually became clear that most of the real value did not come from "big MLOps concepts," but from a few inconspicuous principles:

    Unambiguous Input → Unambiguous Output

    Every model run had to have:

    • a clearly defined data slice,
    • an explicit configuration,
    • a traceable result.

    Metadata is Not a Bonus, but a Foundation

    Without metadata:

    • you cannot compare models,
    • you cannot explain decisions,
    • you cannot go back in time.

    Automation Only After Stabilization

    Everything that was automated too early
    only accelerated the chaos.


    What, on the Other Hand, Did Not Bring the Expected Value

    The Unified Pipeline was not immune to dead ends. Some things looked good in presentations but failed in practice:

    Overly Fine-Grained Orchestration

    Each micro-step being managed separately led to:

    • fragility,
    • difficult debugging,
    • and a loss of overview.

    A Universal Solution Without Context

    The attempt to have "one pipeline for everything"
    ended either in:

    • an explosion of conditions,
    • or implicit exceptions.

    Complex Monitoring Without an Interpretation Layer

    Graphs without context do not create understanding.
    Just more noise.


    MLOps as a Sociotechnical System

    An important shift occurred when MLOps was no longer viewed purely technically.

    The pipeline, in fact:

    • shapes the way of working,
    • influences decision-making,
    • and determines what is "normal" and what is an "exception."

    The Unified Pipeline thus functioned as:

    • unwritten documentation of good practice,
    • protection against hasty shortcuts,
    • and a common reference frame for the team.

    Speed Returns – This Time, Sustainably

    Only when:

    • the pipeline boundaries were clear,
    • the inputs and outputs were stable,
    • and the process was understandable even without its author,

    did speed begin to reappear.

    But a different kind of speed than at the beginning of the project:

    • less dramatic,
    • less visible,
    • but reliable in the long term.

    Recap: What to Take Away When Designing a Similar Framework

    Finally, a few practical, transferable tips for anyone considering their own "unified" approach.

    1. Don’t Start with Tools, Start with Questions

    Ask yourself:

    • What decisions should the system support?
    • What errors are still acceptable?
    • What must be traceable even a year from now?

    Only then choose the technology.

    2. Time Belongs in the Architecture, Not Just in Validation

    If the pipeline:

    • doesn’t know when the model was created,
    • on what period it was tested,
    • and for what time it is intended,

    then it is not production-ready – it just runs in production.

    3. Configuration is a Communication Tool

    A good configuration:

    • explains decisions,
    • allows for comparison,
    • and forces explicitness.

    If the configuration cannot be read without running the code,
    it is not good enough.

    4. Optimize for Stability, Not for the Maximum

    The model with the highest metric:

    • is often the most fragile.

    The model that behaves predictably over time:

    • is often the most valuable.

    5. The Pipeline Should Protect the Team – Even from Itself

    A well-designed framework:

    • prevents impulsive shortcuts,
    • reduces dependence on individuals,
    • and increases confidence in the results.

    That is its true role.


    What’s Next

    In the final part, I will look back:

    What I would do differently today
    – where the Unified Pipeline was unnecessarily ambitious,
    – where, on the contrary, it could have gone further,
    – and which principles I would take with me to any other project.

  • Unified Pipeline – Part 5: What I Would Do Differently Today

    Part 5: What I Would Do Differently Today

    Experience as a Filter

    The Unified Pipeline was not born as an academic project.
    It was created under the pressure of reality: time, operations, and responsibility.

    With hindsight, however, it is clear that:

    • some decisions were right,
    • some were necessary,
    • and some were more a reaction to a specific situation than a generally optimal solution.

    This part is not a critique of the project.
    It is an attempt to separate the principles that will endure from the solutions that were conditioned by their time.


    1. Less Abstraction at the Beginning

    One of the things I would change today is the pace of abstraction.

    From the beginning, the Unified Pipeline was designed as:

    • a general framework,
    • usable for multiple types of models,
    • with a high degree of configurability.

    This brought flexibility, but also a cost:

    • longer onboarding,
    • a more complex mental model,
    • and sometimes the need to "understand the system before solving the problem."

    Today I would:

    • start with a narrower scope,
    • let abstractions arise from repetition,
    • and sacrifice some "elegance" for the sake of readability.

    2. An Even Stricter Separation of Experiment and Production

    Although the Unified Pipeline clearly distinguished between experiment and production, in practice:

    • some transitions remained too fluid,
    • and the experimental mindset sometimes seeped into places where it no longer belonged.

    Today I would:

    • isolate the experimental phase even more,
    • "lock down" the production pipeline more,
    • and make the transition between them a conscious decision, not a gradual evolution.

    Not for the sake of control, but to protect both worlds.


    3. More Investment in Interpretation, Less in Optimization

    The Unified Pipeline was very good at:

    • training,
    • validating,
    • and comparing models.

    Looking back, I see that:

    even more value would have been brought by a stronger interpretation layer.

    Not in the sense of:

    "explainability for an audit,"

    but in the sense of:

    • what type of behavior the model represents,
    • when to trust it and when not to,
    • how to read its failures.

    Today I would:

    shift some of the optimization energy to this area.


    4. Less Implicit Expertise in the Design

    The Unified Pipeline carried a lot of:

    • domain knowledge,
    • methodological assumptions,
    • and "silent" decisions.

    For an experienced team, this worked great.
    For newcomers, not so much.

    From today’s perspective, I would:

    • externalize more of these assumptions,
    • name them more,
    • and rely less on the fact that "it’s obvious."

    A pipeline should be readable even without its author in the room.


    5. What I Would Take to Every Future Project

    Despite all the points above, there are principles that I would use again today – without change.

    • Time as the fundamental axis of the system
    • Stability over the maximum
    • The process is more important than the individual model
    • The pipeline as a carrier of culture, not just code
    • Constraints as a tool for quality, not a brake

    These principles proved to be:

    • technologically agnostic,
    • transferable,
    • and sustainable in the long term.

    The Unified Pipeline as a Milestone, Not a Goal

    Today, I no longer see the Unified Pipeline as:

    "a finished solution,"
    nor as a universal blueprint.

    I see it as:

    a milestone in thinking about what it means to do data science responsibly over time.

    And that, perhaps, is its greatest value.


    In Conclusion

    If I had to summarize the entire series in one sentence, it would be this:

    Production data science is not about how smart the model is,
    but about how well the system handles the reality in which the model lives.

© 2026 Michael Princ. Všechna práva vyhrazena.

Vytvořeno s WordPress