< retour aux publications

Fault-Tolerant Adaptive Routing under an Unconstrained Set of Node and Link Failures for Many-Core Systems-on-Chip

Auteur(s) : M. Dimopoulos, Yi Gang, L. Anghel, M. Benabdenbi, N.-E. Zergainoh, M. Nicolaidis

Journal : Microprocessors and Microsystems

Volume : 38

Issue : 6

Pages : 620–635

Doi : 10.1016/j.micpro.2014.04.003

An online fault tolerant routing algorithm for 2D mesh Networks-on-Chip is presented in this work. It combines an adaptive routing algorithm with neighbor fault-awareness and a new traffic-balancing metric. To be able to cope with runtime permanent and temporary failures that may result in message corruption, message loss or deadlocks, the routing algorithm is enhanced with packet retransmission and a new message recovery scheme. Simulation results, for various network sizes, different traffic patterns, under an unconstrained number of node and link faults, temporary and/or permanent, demonstrate the scalability and efficiency of the proposed algorithm to tolerate multiple failures likely encountered in deep submicron technologies. As the experiments have shown, the proposed algorithm maintains high reliability of more than 97.68% for a 2D mesh network of 16 × 16 and in the presence of 384 simultaneous link faults. For the same network and in the extreme scenario of 103 routers being simultaneously faulty, the obtained reliability is more than 93.40%.