Checkpointing Support for Distributed Containerized CPS Co-Simulations
Li, Ziqi
0000-0001-5212-5903
:
2021-07-13
Abstract
Co-simulations allow heterogeneous simulators to be composed as a distributed system to enable simulating a large-scale system comprising heterogeneous systems. Cyber-physical Systems (CPS) is one such domain that can benefit from such co-simulations, where these simulations can be utilized as Digital Twins at run-time to control the CPS system. However, any distributed system is prone to failures and any such failure may require these co-simulations to start from the beginning unless appropriate distributed checkpointing and rollback capabilities are provided. With an increasing trend toward designing such co-simulations as containerized distributed systems, realizing a coordinated and distributed checkpointing and rollback capability remains an unresolved problem for a variety of reasons including lack of support for checkpointing in the underlying container technology, effective coordination of checkpoints across the distributed containers and resource management to ensure minimal impact on the simulation response time due to checkpointing support. To address these issues, this research has designed and implemented a novel and systematic approach to checkpoint and restore distributed containerized co-simulations, based on Docker’s experimental checkpointing feature. First, it addresses fundamental technical issues for CRIU such as checkpointing Jgroups multicast and membership auto removal from heartbeat detection. Second, it provides a pause mechanism at the application level to pause all the containers before checkpointing them. Third, it extends EXPPO, which is an existing co-simulation-as-a-service middleware with the newly designed checkpointing and rollback capabilities so that a co-simulation can be recovered from a failure. The research outcomes are demonstrated and validated using a co-simulation constructed of applications from the PARSEC benchmarking suite on a Chameleon cloud computing research platform.