Published November 2006 | Version v1
Report

Optimizing jobs timeouts on clusters and production grids

Contributors

Others:

Description

This paper presents a method to optimize the timeout value of grid computing jobs. It relies on a model of the job execution time that considers the job management system latency through a random variable. It also takes into account a proportion of outliers to model either reliable clusters or production grids characterized by faults causing jobs loss. Job management systems are first studied considering classical distributions of the latency. Different behaviors are exhibited, depending on the weight of the tail of the distribution and on the amount of outliers. Experimental results are then shown based on the latency distribution and outlier ratios measured on the EGEE grid infrastructure. Those results show that using the optimal timeout value provided by our method reduces the impact of outliers and leads to a 1.36 speed-up for reliable systems without outliers.

Abstract

I3S laboratory Research Report (I3S/RR-2006-35-FR), Sophia Antipolis, France

Additional details

Identifiers

URL
https://hal.archives-ouvertes.fr/hal-00691828
URN
urn:oai:HAL:hal-00691828v1

Origin repository

Origin repository
UNICA