Tag: architecture

  • Unified Pipeline – Part 1: Why the Unified Pipeline Was Created

    Series: Unified Pipeline – Experiences from Building a Production ML System

    Series Goal:
    To show how theoretical data science differs from production reality and why infrastructure, process, and governance are often more important than the model itself.


    Planned Parts

    1. Why the Unified Pipeline Was Created in the First Place – a problem that couldn’t be solved with a better model
    2. From Experiments to a System – architectural principles and decisions
    3. Time as the Enemy of the Model – time-aware validation, stability, and the reality of operations
    4. MLOps Without the Buzzwords – what actually increased speed and quality
    5. What I Would Do Differently Today – lessons learned, dead ends, and transferable principles

    Part 1: Why the Unified Pipeline Was Created in the First Place

    When a Better Model Isn’t Enough

    At a certain stage in data science work, one reaches a point where further model improvements no longer provide corresponding value.
    Not because the models are "good enough," but because the problem is no longer statistical.

    It was at this exact point that the idea for the Unified Pipeline was born.

    At first glance, everything was fine:

    • predictive models existed,
    • the results were not bad,
    • the data was available.

    Yet, development was slow, changes were risky, and knowledge transfer was difficult. Every new use-case meant:

    • re-solving data preparation,
    • re-solving validation,
    • re-solving deployment,
    • and often, re-discovering the same mistakes.

    This is not a failure of people.
    This is a failure of the work architecture.


    The Hidden Debt: Fragmentation

    The fundamental problem was not in the individual models, but in the fact that:

    • each was created slightly differently,
    • had a different validation approach,
    • handled time differently,
    • was deployed differently.

    The result was fragmentation:

    • fragmentation of code,
    • fragmentation of responsibility,
    • fragmentation of knowledge.

    And most importantly: no change was cheap.


    One Pipeline ≠ One Model

    The Unified Pipeline was not an attempt to create "one universal model."
    It was an effort to create one universal way of thinking about how models are built, tested, and operated.

    The basic idea was simple:

    If two models solve a different problem, but run at the same time, on the same data, and in the same production environment,
    they should share the maximum amount of infrastructure and the minimum amount of variability.

    In other words:

    variability should be explicit,
    not hidden in ad-hoc scripts.


    Speed as a Consequence, Not a Goal

    There is often talk of "speeding up development."
    But the Unified Pipeline was not created to be fast.

    It was created to be:

    • predictable,
    • auditable,
    • repeatable.

    Speed came as a consequence:

    • less ad-hoc decision making,
    • less re-inventing the wheel,
    • fewer "heroic" interventions.

    And this is what made it possible to:

    • deploy new models significantly faster,
    • test more variants without chaos,
    • and focus more on the purpose of the model than on its surroundings.

    Why "Unified"

    The word Unified was not for marketing.
    It was chosen intentionally.

    The Pipeline unified:

    • the way of working with time,
    • the method of validation,
    • the versioning method,
    • the deployment method,
    • and even the way of thinking about models.

    And that is perhaps its greatest contribution:
    it unified the team’s mental model, not just the code.


    What’s Next

    In the next part, I will look at:

    • why it was necessary to abandon a purely experimental approach,
    • which architectural decisions were key,
    • and where it turned out that "best practices from blogs" often don’t work in real operation.
  • Unified Pipeline – Part 2: From Experiments to a System

    Part 2: From Experiments to a System

    An Experiment is a Great Servant, but a Bad Master

    Most data science teams start correctly:
    rapid experiments, notebooks, iterations, searching for a signal in the data.

    The problem arises when:

    • an experiment outlives its purpose,
    • and gradually becomes production.

    A notebook that was supposed to answer the question "does this make sense?"
    quietly transforms into:

    • a source of truth,
    • a reference implementation,
    • and eventually, a critical dependency.

    The Unified Pipeline was created at the moment when it became clear that:

    The experimental approach was already holding back the system as a whole.

    Not because the experiments were bad.
    But because they are not meant to bear long-term responsibility.


    The Often Overlooked Transition Point

    There is a moment when a team should consciously ask:

    "Is this model still an experiment, or is it a system now?"

    This transition point is often ignored because:

    • the model "works,"
    • the metric looks good,
    • the business is satisfied.

    But it is at this moment that technical and methodological debt begins to accumulate:

    • unclear validation logic,
    • implicit assumptions about the data,
    • fragile deployment,
    • knowledge locked in the minds of individuals.

    The Unified Pipeline was a reaction to this silent transition into production without a change in mindset.


    Architecture as a Tool of Discipline

    One of the key decisions was to understand architecture not as:

    "a technical solution"

    but as:

    a tool for enforcing the right decisions.

    The Pipeline was designed so that:

    • validation could not be easily bypassed,
    • training could not be done without a clear time context,
    • a model could not be deployed without versioning and metadata.

    Not because the team was incapable of discipline.
    But because the system should be stronger than individual will.


    Configuration Instead of Improvisation

    A fundamental shift occurred when:

    decision-making moved from code to configuration.

    This had several consequences:

    • the differences between models were explicit,
    • the pipeline was readable even without being run,
    • and it was possible to compare models systematically, not based on feelings.

    Instead of the question:

    "What does this script actually do?"

    the team could ask:

    "What type of decision does this model represent?"

    And that is a huge difference.


    Time as a First-Class Problem

    One of the strongest architectural decisions was:

    to treat time as the central axis of the entire system.

    Not as a detail of validation, but as:

    the basic structure of the pipeline.

    This meant that:

    • every training had a clear time context,
    • validation respected the reality of deployment,
    • and the results were interpretable even in retrospect.

    The Unified Pipeline thus stopped optimizing for "statistical truth"
    and began to optimize for decision-making in time.


    From "the Best Model" to "the Best Process"

    Perhaps the most important change was mental:

    The goal was no longer to have the best model.
    The goal was to have the best process that consistently creates good models.

    This meant:

    • fewer heroic solutions,
    • more reproducible procedures,
    • less dependence on individuals,
    • more shared understanding.

    The Unified Pipeline thus became more of a:

    production philosophy
    than just a technical artifact.


    What’s Next

    In the next part, I will focus on a topic that is often underestimated yet crucial:

    the temporal stability of models
    – why standard cross-validation fails,
    – how "a good model today" differs from "a good model in six months,"
    – and why time is often more important than feature engineering.

© 2026 Michael Princ. All rights reserved.

Built with WordPress