TY - JOUR AU - Lucbert, Adrien AU - van der Niet, Juliën AU - Corson, Albert AU - Weij, Michael AU - van der Elst, Ramon Isaac AU - Martínez de Juan, Jesús Mª AU - Salcedo Rahola, Tadeo Baldiri PY - 2022/05/20 Y2 - 2024/03/29 TI - Time Series Building Energy Systems Data Imputation JF - CLIMA 2022 conference JA - CLIMA VL - IS - SE - Digitization DO - 10.34641/clima.2022.302 UR - https://proceedings.open.tudelft.nl/clima2022/article/view/302 SP - AB - <p>Completeness of data is vital for the decision making and forecasting on Building Management<br>Systems (BMS) as missing data can result in biased decision making down the line. This study creates a guideline for imputing the gaps in BMS datasets by comparing four methods: K Nearest Neighbour algorithm (KNN), Recurrent Neural Network (RNN), Hot Deck (HD) and Last Observation Carried Forward (LOCF). The guideline contains the best method per gap size and scales of measurement. The four selected methods are from various backgrounds and are tested on a real BMS and meteorological dataset. The focus of this paper is not to impute every cell as accurately as possible but to impute trends back into the missing data. The performance is characterised by a set of criteria in order to allow the user to choose the imputation method best suited for its needs. The criteria are: Variance Error (VE) and Root Mean Squared Error (RMSE). VE has been given more weight as its ability to evaluate the imputed trend is better than RMSE. From preliminary results, it was concluded that the best K‐values for KNN are 5 for the smallest gap and 100 for the larger gaps. Using a genetic algorithm the best RNN architecture for the purpose of this paper was determined to be Gated Recurrent Units (GRU). The comparison was performed using a different training dataset than the imputation dataset. The results show no consistent link between the difference in Kurtosis or Skewness and imputation performance. The results of the experiment concluded that RNN is best for interval data and HD is best for both nominal and ratio data. There was no single method that was best for all gap sizes as it was dependent on the data to be imputed.</p> ER -