Scalable epidemic message passing interface fault tolerance

Soma Sekhar Kolisetty; Battula Srinivasa Rao

doi:10.11591/eei.v11i2.3374

Scalable epidemic message passing interface fault tolerance

Soma Sekhar Kolisetty, Battula Srinivasa Rao

Abstract

Resilience and fault tolerance are challenging tasks in the field of high performance computing (HPC) and extreme scale systems. Components fail more often in such systems, results in application abort. Adopting faultâ€“tolerance techniques can be consistently detect failures and continue applicationâ€™s execution even if the failures exist. A prominent parallel programming specification, message passing interface (MPI), as it would be used to implement failure detection and consensus algorithm in this paper. Although the MPI does not facilitate fault tolerant behavior, this work presents a fault tolerant, matrix based failure detection and consensus algorithm. The proposed algorithm uses Gossiping. To detect failures, randomised pinging will be applied during the execution of the algorithm by using piggybacked gossip messages. In order to achieve consensus on the failures in the system, failed processesâ€™ information will be sent using the same piggybacked gossip messages to all the alive processes. The algorithm was implemented in MPI framework and is completely fault tolerant. The results exhibit all the MPI process failures were detected using randomised pinging and global consensus has achieved on failed MPI process in the system.

Keywords

Consensus; Epidemic protocols; Failure detection; Fault tolerance; MPI; Parallelism; Scalability

Full Text:

PDF

DOI: https://doi.org/10.11591/eei.v11i2.3374

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Bulletin of EEI Stats

Bulletin of Electrical Engineering and Informatics (BEEI)
ISSN: 2089-3191, e-ISSN: 2302-9285
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

Username
Password
Remember me