PENB Label Approximation – Part 2: Turning Regular Consumption Data into Valid Input

Part 2: Turning Regular Consumption Data into Valid Input

A model is only as good as its input

In projects working with operational data, the biggest mistake is often assuming the main value lies in the algorithm itself. In reality, the quality of the outcome is often determined before any calculation happens.

For PENB approximation, it’s especially critical that the application correctly understands:

  • what consumption data is available,
  • which period it covers,
  • when the user is heating and when not,
  • which part of the energy likely relates to heating and which to hot water or regular use.

What the application actually needs from the user

The practical input is intentionally kept fairly simple:

  • location,
  • apartment area and ceiling height,
  • type of heating,
  • temperature regime,
  • consumption time series,
  • selection of non-heating months,
  • method for hot water approximation.

This is an important compromise. If the application asked for too many details, most users wouldn’t finish. If it asked for too little, the result would lose its grounding in reality.


Why uploading a CSV isn’t enough

Uploading a file is technically easy, but not enough in terms of data. Consumption alone doesn’t tell you:

  • whether it’s heating or another component,
  • whether there are gaps in the data,
  • whether the observations match the heating season,
  • whether the measurement period is sufficient for the chosen calculation mode.

That’s why the workflow includes selecting non-heating months and splitting energy into heating-related and hot water or regular usage parts.


Validation isn’t about restricting the user

Good validation doesn’t feel like a barrier. It’s a way to prevent the app from returning a confident result based on inconsistent data.

In this project, validation handles for example:

  • minimum data length based on calculation mode,
  • input field logic for heating type,
  • consistency of temperature regime,
  • presence of expected columns in the input file.

From a product perspective, this matters because users get feedback early—not after several minutes of calculation.


Why this is interesting for data science

A workflow like this shows that data science in production isn’t just about modeling. It’s also about designing how data enters the system so results are repeatable and interpretable.

This is exactly where:

  • data quality,
  • domain logic,
  • form UX,
  • and the operational reality of everyday users meet.

What’s next

In the next part, I’ll look at the core of the estimation: how weather data enters the app, why it’s important to distinguish the heating season, and the role of a simplified RC model in calibrating the apartment’s energy behavior.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2026 Michael Princ. All rights reserved.

Built with WordPress