Time Series Building Energy Systems Data Imputation

Authors

  • Adrien Lucbert Epitech | France
  • Juliën van der Niet The Hague University of Applied Sciences | the Netherlands
  • Albert Corson Epitech | France
  • Michael Weij The Hague University of Applied Sciences | the Netherlands
  • Ramon Isaac van der Elst The Hague University of Applied Sciences | the Netherlands
  • Jesús Mª Martínez de Juan Francisco de Vitoria University | Spain
  • Tadeo Baldiri Salcedo Rahola The Hague University of Applied Sciences | the Netherlands

DOI:

https://doi.org/10.34641/clima.2022.302

Keywords:

Building Management System time series data, Imputation, KNN, RNN, Hot Deck, trend

Abstract

Completeness of data is vital for the decision making and forecasting on Building Management
Systems (BMS) as missing data can result in biased decision making down the line. This study creates a guideline for imputing the gaps in BMS datasets by comparing four methods: K Nearest Neighbour algorithm (KNN), Recurrent Neural Network (RNN), Hot Deck (HD) and Last Observation Carried Forward (LOCF). The guideline contains the best method per gap size and scales of measurement. The four selected methods are from various backgrounds and are tested on a real BMS and meteorological dataset. The focus of this paper is not to impute every cell as accurately as possible but to impute trends back into the missing data. The performance is characterised by a set of criteria in order to allow the user to choose the imputation method best suited for its needs. The criteria are: Variance Error (VE) and Root Mean Squared Error (RMSE). VE has been given more weight as its ability to evaluate the imputed trend is better than RMSE. From preliminary results, it was concluded that the best K‐values for KNN are 5 for the smallest gap and 100 for the larger gaps. Using a genetic algorithm the best RNN architecture for the purpose of this paper was determined to be Gated Recurrent Units (GRU). The comparison was performed using a different training dataset than the imputation dataset. The results show no consistent link between the difference in Kurtosis or Skewness and imputation performance. The results of the experiment concluded that RNN is best for interval data and HD is best for both nominal and ratio data. There was no single method that was best for all gap sizes as it was dependent on the data to be imputed.

Downloads

Published

2022-05-20

How to Cite

Lucbert, A., van der Niet, J., Corson, A., Weij, M., van der Elst, R. I., Martínez de Juan, J. M., & Salcedo Rahola, T. B. (2022). Time Series Building Energy Systems Data Imputation. CLIMA 2022 Conference. https://doi.org/10.34641/clima.2022.302

Conference Proceedings Volume

Section

Digitization