Tag: validation

  • PENB Label Approximation – Part 2: Turning Regular Consumption Data into Valid Input

    Part 2: Turning Regular Consumption Data into Valid Input

    A model is only as good as its input

    In projects working with operational data, the biggest mistake is often assuming the main value lies in the algorithm itself. In reality, the quality of the outcome is often determined before any calculation happens.

    For PENB approximation, it’s especially critical that the application correctly understands:

    • what consumption data is available,
    • which period it covers,
    • when the user is heating and when not,
    • which part of the energy likely relates to heating and which to hot water or regular use.

    What the application actually needs from the user

    The practical input is intentionally kept fairly simple:

    • location,
    • apartment area and ceiling height,
    • type of heating,
    • temperature regime,
    • consumption time series,
    • selection of non-heating months,
    • method for hot water approximation.

    This is an important compromise. If the application asked for too many details, most users wouldn’t finish. If it asked for too little, the result would lose its grounding in reality.


    Why uploading a CSV isn’t enough

    Uploading a file is technically easy, but not enough in terms of data. Consumption alone doesn’t tell you:

    • whether it’s heating or another component,
    • whether there are gaps in the data,
    • whether the observations match the heating season,
    • whether the measurement period is sufficient for the chosen calculation mode.

    That’s why the workflow includes selecting non-heating months and splitting energy into heating-related and hot water or regular usage parts.


    Validation isn’t about restricting the user

    Good validation doesn’t feel like a barrier. It’s a way to prevent the app from returning a confident result based on inconsistent data.

    In this project, validation handles for example:

    • minimum data length based on calculation mode,
    • input field logic for heating type,
    • consistency of temperature regime,
    • presence of expected columns in the input file.

    From a product perspective, this matters because users get feedback early—not after several minutes of calculation.


    Why this is interesting for data science

    A workflow like this shows that data science in production isn’t just about modeling. It’s also about designing how data enters the system so results are repeatable and interpretable.

    This is exactly where:

    • data quality,
    • domain logic,
    • form UX,
    • and the operational reality of everyday users meet.

    What’s next

    In the next part, I’ll look at the core of the estimation: how weather data enters the app, why it’s important to distinguish the heating season, and the role of a simplified RC model in calibrating the apartment’s energy behavior.

  • Unified Pipeline – Part 3: Time as the Enemy of the Model

    Part 3: Time as the Enemy of the Model

    When Validation Lies Without Meaning To

    One of the most unpleasant experiences in applied data science is this:

    A model has great validation metrics –
    and yet it fails in production.

    Not dramatically.
    Not immediately.
    But systematically.

    The predictions are "somehow worse," stability fluctuates, and trust in the model gradually fades. And yet:

    • the pipeline is running,
    • the data is flowing,
    • the code hasn’t changed.

    The problem is not in the implementation.
    The problem is in time.


    The Illusion of Randomness

    Standard validation approaches implicitly assume that:

    • the data is randomly shuffled,
    • the distribution is stable,
    • the future is statistically similar to the past.

    These are reasonable assumptions for textbooks.
    But not for decision-making systems running in time.

    As soon as a model:

    • influences real decisions,
    • works with human behavior,
    • reacts to external conditions,

    then time becomes an active player, not just an index.


    Why Random Data Splitting Fails

    When randomly splitting training and validation data:

    • the model sees future patterns,
    • it learns relationships that do not exist in real time,
    • and the metrics look better than reality.

    This is not a flaw in the methodology.
    It is a mismatch between the question and the tool.

    The question in production is:

    "How will the model behave on data that does not yet exist?"

    But random validation answers a different question:

    "How well does the model interpolate within a known distribution?"


    The Unified Pipeline and Time Discipline

    The Unified Pipeline placed time at the center of the entire process:

    • training,
    • validation,
    • and interpretation of results.

    Each model was:

    • placed in a specific time context,
    • tested on data that actually followed,
    • and evaluated not only by performance but also by its stability over time.

    Validation ceased to be a single number
    and became a time trajectory.


    Stability as a Quality Metric

    It gradually became clear that:

    • the highest validation metric is not necessarily the best choice,
    • a model with slightly worse performance but higher stability is often more valuable in production.

    This led to a shift in thinking:

    • from maximizing a point metric,
    • to evaluating the model’s behavior across periods.

    In other words:

    A model is not evaluated on how good it was,
    but on how reliable it tends to be.


    Time Reveals True Overfitting

    Overfitting is often understood as:

    • a model that is too complex,
    • too many parameters,
    • too little regularization.

    But time reveals a different type of overfitting:

    the model is perfectly adapted to the past world,
    but fragile to change.

    The Unified Pipeline, therefore, did not just address:

    whether the model is overfit,

    but mainly:

    what it is overfit to.


    The Unpleasant Truth

    One of the most important findings was this:

    If a model cannot fail predictably,
    it cannot be trustworthy.

    Time-aware validation often:

    • lowered metrics,
    • complicated comparisons,
    • and forced the team to make unpleasant decisions.

    But it was precisely because of this that:

    • false certainty disappeared,
    • and trust in what the model can actually do grew.

    What’s Next

    In the next part, I will move from methodology to practice:

    MLOps without the buzzwords
    – what actually accelerated development,
    – what, on the other hand, added complexity without value,
    – and why "the right infrastructure" often means fewer, not more, tools.

© 2026 Michael Princ. All rights reserved.

Built with WordPress