< back to publications

Adaptive Routing for Fault Tolerance and Congestion Avoidance for 2D Mesh and Torus NoCs in Many-Core Systems-on-Chip

Author(s): M. Benabdenbi, L. Anghel, M. Dimopoulos, Yi Gang

Doc. Source: Advances in Microelectronics: Reviews

Publisher: IFSA, International Frequency Sensor Association

Pages: 405-435

An online fault tolerant routing algorithm for 2D Mesh and Torus Networks-on-Chip is presented in this work. It combines an adaptive routing algorithm with neighbor fault-awareness and a new traffic-balancing metric. To be able to cope with runtime permanent and temporary failures that may result in message corruption, message loss or deadlocks, the routing algorithm is enhanced with packet retransmission and a new message recovery scheme. Simulation results, for various network sizes, different traffic patterns, under an unconstrained number of node and link faults, temporary and/or permanent, demonstrate the scalability and efficiency of the proposed algorithm to tolerate multiple failures likely encountered in deep submicron technologies. As the experiments have shown, the proposed algorithm maintains high reliability of more than 97.68% for a 2D mesh network of 16x16 and in the presence of 384 simultaneous link faults out of 960 total links. For the same network and in the extreme scenario of 103 routers being simultaneously faulty out of 256 routers, the obtained reliability is more than 93.40%. Applied to torus topology, CAFTA algorithm keep improving the latency and the reliability. For the extreme scenario where 40% of the links are faulty, the packet delivery rate is more than 99.6 %