Published July 4, 2011 | Version v1
Conference paper

Adaptive Fault Tolerance in Real Time Cloud Computing

Other:
Active objects, semantics, Internet and security (OASIS) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-COMmunications, Réseaux, systèmes Embarqués et Distribués (Laboratoire I3S - COMRED) ; Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)

Description

With the increasing demand and benefits of cloud computing infrastructure, real time computing can be performed on cloud infrastructure. A real time system can take advantage of intensive computing capabilities and scalable virtualized environment of cloud computing to execute real time tasks. In most of the real time cloud applications, processing is done on remote cloud computing nodes. So there are more chances of errors, due to the undetermined latency and loose control over computing node. On the other side, most of the real time systems are also safety critical and should be highly reliable. So there is an increased requirement for fault tolerance to achieve reliability for the real time computing on cloud infrastructure. In this paper, a fault tolerance model for real time cloud computing is proposed. In the proposed model, the system tolerates the faults and makes the decision on the basis of reliability of the processing nodes, i.e. virtual machines. The reliability of the virtual machines is adaptive, which changes after every computing cycle. If a virtual machine manages to produce a correct result within the time limit, its reliability increases. And if it fails to produce the result within time or correct result, its reliability decreases. A metric model is given for the reliability assessment. In the model, decrease in reliability is more than increase. If the node continues to fail, it is removed, and a new node is added. There is also a minimum reliability level. If any processing node does not achieve that level, the systems will perform backward recovery or safety measures. The proposed technique is based on the execution of design diverse variants on multiple virtual machines, and assigning reliability to the results produced by variants. The virtual machine instances can be of same type or of different types. The system provides both the forward and backward recovery mechanism, but main focus is on forward recovery. The main essence of the proposed technique is the adaptive behavior of the reliability weights assigned to each processing node and adding and removing of nodes on the basis of reliability.

Abstract

International audience

Additional details

Created:
December 3, 2022
Modified:
November 29, 2023