Modeling the latency on production grids with respect to the execution context
- Others:
- Laboratoire d'Informatique, Signaux, et Systèmes de Sophia-Antipolis (I3S) / Equipe MODALIS ; Scalable and Pervasive softwARe and Knowledge Systems (Laboratoire I3S - SPARKS) ; Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)
- Institute of Informatics/Academic Medical Center ; University of Amsterdam [Amsterdam] (UvA)
Description
In this paper, we study grid jobs latency. Together with outliers, latency highly impacts applications performance on production grids, due to its order of magnitude and important variations. It is particularly prejudicial for determining the expected duration of applications handling a high number of jobs and it makes outliers detection difficult. In a previous work, a probabilistic model of the latency has been used to estimate an optimal timeout value considering a given distribution of jobs latencies. This timeout value is then used in a job resubmission strategy. The purpose of this paper is to evaluate to what extent updating this model with relevant contextual parameters can help to refine the latency estimation. In the first part of the paper, we study the validity of parameters along several weeks. Experiments on the EGEE production grid show that performance can be improved by updating model parameters. In the second part, we study the influence of the resource broker or the computing site and the day of the week. We experimentally show that some of them have a statistically significant influence on the job latency. We exploit this contextual information in the perspective of improving job submission strategies.
Abstract
International audience
Additional details
- URL
- https://hal.archives-ouvertes.fr/hal-00459261
- URN
- urn:oai:HAL:hal-00459261v1
- Origin repository
- UNICA