Abstract

The incorporation of emerging technologies, including solar photovoltaics, electric vehicles, battery energy storage, smart devices, Internet-of-Things devices, and sensors in buildings, desirable control objectives are becoming increasingly complex, calling for advanced controls approaches. Reinforcement learning (RL) is a powerful method for this. RL can adapt and learn from environmental interaction, but it can take a long time to learn and can be unstable initially due to limited environmental knowledge. In our research, we propose an online RL approach for buildings that uses data-driven surrogate models to guide the RL agent during its early training. This helps the controller learn faster and more stably than the traditional direct plug-and-learn online learning approach. In this research, we propose an online approach in buildings with RL where, with the help of data-driven surrogate models, the RL agent is guided during its early exploratory training stage, aiding the controller to learn a near-optimal policy faster and exhibiting more stable training progress than a traditional direct plug-and-learn online learning RL approach. The agents are assisted in their learning and action with information gained from the surrogate models generating multiple artificial trajectories starting from the current state. The research presented an exploration of various surrogate model-assisted training methods and revealed that models focusing on artificial trajectories around rule-based controls yielded the most stable performance. In contrast, models employing random exploration with a one-step look-ahead approach demonstrated superior overall performance.

Graphical Abstract Figure
Graphical Abstract Figure
Close modal

References

1.
Buildings, G.
A.
and
Construction
,
2020
, “
Global Alliance for Buildings and Construction 2020 GLOBAL STATUS REPORT FOR BUILDINGS AND CONSTRUCTION, Towards a Zero-Emissions, Efficient and Resilient Buildings and Construction Sector
. http://www.un.org/Depts/.
2.
Richter
,
B.
,
Crabtree
,
G.
,
Glicksman
,
L.
,
Goldstein
,
D.
,
Goldston
,
D.
,
Greene
,
D.
,
Kammen
,
D.
,
Levine
,
M.
,
Lubell
,
M.
,
Savitz
,
M.
, and
Sperling
,
D.
,
2008
,
Energy Future: Think Efficiency
, Unpublished, pp.
1
109
.
3.
Tyra
,
B.
,
Cassar
,
C.
,
Liu
,
J.
,
Wong
,
P.
, and
Yildiz
,
O.
,
2019
, Electric Power Monthly With Data for November 2020.
4.
Chen
,
B.
,
Cai
,
Z.
, and
Bergés
,
M.
,
2019
, “
Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy
,”
Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation
,
New York
,
Nov. 13
, pp.
316
325
.
5.
Roth
,
A.
, and
Reyna
,
J.
,
2019
, Grid-Interactive Efficient Buildings Technical Report Series: Whole-Building Controls, Sensors, Modeling, and Analytics. https://www.energy.gov/eere/buildings.
6.
Wang
,
Z.
, and
Hong
,
T.
,
2020
, “
Reinforcement Learning for Building Controls: The Opportunities and Challenges
,”
Appl. Energy
,
269
, p.
115036
.
7.
Kontes
,
G. D.
,
Giannakis
,
G. I.
,
Sánchez
,
V.
,
de Agustin-Camacho
,
P.
,
Romero-Amorrortu
,
A.
,
Panagiotidou
,
N.
,
Rovas
,
D. V.
,
Steiger
,
S.
,
Mutschler
,
C.
, and
Gruen
,
G.
,
2018
, “
Simulation-Based Evaluation and Optimization of Control Strategies in Buildings
,”
Energies
,
11
(
12
), p.
3376
.
8.
Sutton
,
R. S.
, and
Barto
,
A. G.
,
2014
,
Reinforcement Learning: An Introduction
,
The MIT Press
,
London, UK
.
9.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Rusu
,
A. A.
,
Veness
,
J.
,
Bellemare
,
M. G.
,
Graves
,
A.
, et al.,
2015
, “
Human-Level Control Through Deep Reinforcement Learning
,”
Nature
,
518
(
7540
), pp.
529
533
.
10.
Silver
,
D.
,
Huang
,
A.
,
Maddison
,
C. J.
,
Guez
,
A.
,
Sifre
,
L.
,
Driessche
,
G. V. D.
, and
Schrittwieser
,
J.
,
2016
, “
Mastering the Game of Go With Deep Neural Networks and Tree Search
,”
Nature
,
529
(
Jan
), pp.
484
489
.
11.
Sallab
,
A. E.
,
Abdou
,
M.
,
Perot
,
E.
, and
Yogamani
,
S.
,
2017
, Deep Reinforcement Learning Framework for Autonomous Driving. arXiv preprint arXiv:1704.02532.
12.
Folkers
,
A.
,
Rick
,
M.
, and
Büskens
,
C.
,
2019
, “
Controlling an Autonomous Vehicle With Deep Reinforcement Learning
,”
2019 IEEE Intelligent Vehicles Symposium (IV)
,
Paris, France
,
June 9
, IEEE, pp.
2025
2031
.
13.
Jebessa
,
E.
,
Olana
,
K.
,
Getachew
,
K.
,
Isteefanos
,
S.
, and
Mohd
,
T. K.
,
2022
, “
Analysis of Reinforcement Learning in Autonomous Vehicles
,”
Jebessa, E., Olana, K., Getachew, K., Isteefanos, S., and Mohd, T. K., 2022, “Analysis of Reinforcement Learning in Autonomous Vehicles,” 2022 IEEE 12th Annual Computing and Communication Workshop and Conference
,
Las Vegas, NV
,
Jan. 26
, IEEE, pp.
0087
0091
.
14.
Levine
,
S.
,
Finn
,
C.
,
Darrell
,
T.
, and
Abbeel
,
P.
,
2016
, “
End-to-End Training of Deep Visuomotor Policies
,”
J. Mach. Learn. Res.
,
17
(
1
), pp.
1334
1373
.
15.
Levine
,
S.
,
Pastor
,
P.
,
Krizhevsky
,
A.
,
Ibarz
,
J.
, and
Quillen
,
D.
,
2018
, “
Learning Hand-Eye Coordination for Robotic Grasping With Deep Learning and Large-Scale Data Collection
,”
Int. J. Rob. Res.
,
37
(
4–5
), pp.
421
436
.
16.
OpenAI, ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt/.
17.
Dey
,
S.
,
Marzullo
,
T.
, and
Henze
,
G.
,
2023
, “
Inverse Reinforcement Learning Control for Building Energy Management
,”
Energy Build.
,
286
, p.
112941
.
18.
Zhang
,
Z.
, and
Lam
,
K. P.
,
2018
, “
Practical Implementation and Evaluation of Deep Reinforcement Learning Control for a Radiant Heating System
,”
Proceedings of the 5th Conference on Systems for Built Environments
,
Shenzen, China
,
Nov. 7
, pp.
148
157
.
19.
Breiman
,
L.
,
2001
, “
Random Forests
,”
Mach. Learn.
,
45
, pp.
5
32
.
20.
Maimon
,
O. Z.
, and
Rokach
,
L.
,
2014
,
Data Mining With Decision Trees: Theory and Applications
, Vol. 81,
World Scientific
,
Singapore
.
21.
Lecun
,
Y.
,
Bengio
,
Y.
, and
Hinton
,
G.
,
2015
, ”Deep Learning,”
Nature
521, pp. 436–444.
22.
Henze
,
G. P.
, and
Schoenmann
,
J.
,
2003
, “
Evaluation of Reinforcement Learning Control for Thermal Energy Storage Systems
,”
HVAC R Res.
,
9
, pp.
259
275
.
23.
Bertsekas
,
D. P.
,
2009
,
Neuro-Dynamic Programming. Encyclopedia of Optimization
,
C.A.
Floudas
and
P. M.
Pardalos
, eds.,
Boston, MA
:
Springer US
, pp.
2555
2560
.
24.
Liu
,
S.
, and
Henze
,
G. P.
,
2006
, “
Experimental Analysis of Simulated Reinforcement Learning Control for Active and Passive Building Thermal Storage Inventory: Part 1. Theoretical Foundation
,”
Energy Build.
,
38
, pp.
142
147
.
25.
Liu
,
S.
, and
Henze
,
G. P.
,
2006
, “
Experimental Analysis of Simulated Reinforcement Learning Control for Active and Passive Building Thermal Storage Inventory: Part 2. Results and Analysis
,”
Energy Build.
,
38
, pp.
148
161
.
26.
Dalamagkidis
,
K.
,
Kolokotsa
,
D.
,
Kalaitzakis
,
K.
, and
Stavrakakis
,
G. S.
,
2007
, “
eReinforcement Learning for Energy Conservation and Comfort in Buildings
,”
Build. Environ.
,
42
, pp.
2686
2698
.
27.
Sutton
,
R. S.
,
1988
, “
Learning to Predict by the Methods of Temporal Differences
,”
Mach. Learn.
,
3
, pp.
9
44
.
28.
Yang
,
L.
,
Nagy
,
Z.
,
Goffin
,
P.
, and
Schlueter
,
A.
,
2015
, “
Reinforcement Learning for Optimal Control of Low Exergy Buildings
,”
Appl. Energy
,
156
, pp.
577
586
.
29.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Graves
,
A.
,
Antonoglou
,
I.
,
Wierstra
,
D.
, and
Riedmiller
,
M.
,
2013
,
Playing Atari With Deep Reinforcement Learning
. arXiv preprint arXiv:1312.5602, pp.
1
9
.
30.
Li
,
B.
, and
Xia
,
L.
,
2015
, “
A Multi-grid Reinforcement Learning Method for Energy Conservation and Comfort of HVAC in Buildings
,”
2015 IEEE International Conference on Automation Science and Engineering (CASE)
,
Gothenburg, Sweden
,
Aug. 24
, pp.
444
449
.
31.
Costanzo
,
G. T.
,
Iacovella
,
S.
,
Ruelens
,
F.
,
Leurs
,
T.
, and
Claessens
,
B. J.
,
2016
, “
Experimental Analysis of Data-Driven Control for a Building Heating System
,”
Sustain. Energy Grid Netw.
,
6
, pp.
81
90
.
32.
Wei
,
T.
,
Wang
,
Y.
, and
Zhu
,
Q.
,
2017
, Deep Reinforcement Learning for Building HVAC Control. doi:10.1145/3061639.3062224.
33.
Li
,
Y.
,
Wen
,
Y.
,
Guan
,
K.
, and
Tao
,
D.
,
2017
,
Transforming Cooling Optimization for Green Data Center Via Deep Reinforcement Learning
. arXiv, pp.
1
11
.
34.
Silver
,
D.
,
Heess
,
N.
,
Degris
,
T.
,
Wierstra
,
D.
, and
Riedmiller
,
M.
,
2014
, Deterministic Policy Gradient Algorithms.
35.
Murugesan
,
S.
,
Jiang
,
Z.
,
Risbeck
,
M. J.
,
Amores
,
J.
,
Zhang
,
C.
,
Ramamurti
,
V.
,
Drees
,
K. H.
, and
Lee
,
Y. M.
,
2020
, “
Less Is More: Simplified State-Action Space for Deep Reinforcement Learning Based HVAC Control
,”
Proceedings of the 1st International Workshop on Reinforcement Learning for Energy Management in Buildings & Cities
,
New York
,
Nov. 17
, pp.
20
23
.
36.
Chen
,
B.
,
Cai
,
Z.
, and
Bergés
,
M.
,
2019
, “
Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy
,” BuildSys 2019 - Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, pp.
316
325
.
37.
Amos
,
B.
,
Rodriguez
,
I. D. J.
,
Sacks
,
J.
,
Boots
,
B.
, and
Kolter
,
J. Z.
,
2018
, Differentiable MPC for End-to-End Planning and Control. http://arxiv.org/abs/1810.13400.
38.
Zou
,
Z.
,
Yu
,
X.
, and
Ergan
,
S.
,
2020
, “
Towards Optimal Control of Air Handling Units Using Deep Reinforcement Learning and Recurrent Neural Network
,”
Build. Environ.
,
168
, p.
106535
.
39.
Arroyo
,
J.
,
Manna
,
C.
,
Spiessens
,
F.
, and
Helsen
,
L.
,
2022
, “
Reinforced Model Predictive Control (RL-MPC) for Building Energy Management
,”
Appl. Energy
,
309
, p.
118346
.
40.
Spangher
,
L.
,
Gokul
,
A.
,
Khattar
,
M.
,
Palakapilly
,
J.
,
Agwan
,
U.
,
Tawade
,
A.
, and
Spanos
,
C.
,
2020
, “
Augmenting Reinforcement Learning With a Planning Model for Optimizing Energy Demand Response
,”
Proceedings of the 1st International Workshop on Reinforcement Learning for Energy Management in Buildings & Cities
,
Yokohama, Japan
,
Nov. 19
, pp.
39
42
.
41.
Pillonetto
,
G.
,
Aravkin
,
A.
,
Gedon
,
D.
,
Ljung
,
L.
,
Ribeiro
,
A. H.
, and
Schön
,
T. B.
,
2023
, Deep Networks for System Identification: A Survey.
42.
Haarnoja
,
T.
,
Zhou
,
A.
,
Abbeel
,
P.
, and
Levine
,
S.
,
2018
, “
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning With a Stochastic Actor
,”
International conference on machine learning
,
Stockholm, Sweden
,
July 3
, Vol. 5, pp.
2976
2989
.
43.
Marzullo
,
T.
,
Dey
,
S.
,
Long
,
N.
,
Vilaplana
,
J. L.
, and
Henze
,
G.
,
2022
, “
A High-Fidelity Building Performance Simulation Test Bed for the Development and Evaluation of Advanced Controls
,”
J. Build. Perform. Simul.
,
15
, pp.
379
397
.
44.
Dey
,
S.
,
Marzullo
,
T.
,
Zhang
,
X.
, and
Henze
,
G.
,
2023
, “
Reinforcement Learning Building Control Approach Harnessing Imitation Learning
,”
Energy AI
,
14
, p.
100255
.
45.
Deru
,
M.
,
Field
,
K.
,
Studer
,
D.
,
Benne
,
K.
,
Griffith
,
B.
,
Torcellini
,
P.
,
Liu
,
B.
,
Halverson
,
M.
,
Winiarski
,
D.
,
Rosenberg
,
M.
, and
Yazdanian
,
M.
,
2011
, US Department of Energy Commercial Reference Building Models of the National Building Stock.
46.
Seppänen
,
O.
,
Fisk
,
W. J.
, and
Faulkner
,
D.
,
2006
, “
Room Temperature and Productivity in Office Work
,”
8th International Conference and Exhibition on Healthy Buildings
,
Lisboa, Portugal
,
June 4
, pp.
243
247
.
47.
Local
,
E.
,
2012
, Chicago, IL Electricity Rates | Electricity Local, https://www.electricitylocal.com/states/illinois/chicago/, Accessed November 7, 2021.
You do not currently have access to this content.