

Network-on-Chip Fault Tolerance through Checkpoint and Rollback Recovery

Auteur(s) : C. Rusu, C. Grecu, L. Anghel

Doc. Source: National Symposium on System-on-Chip - System-in-Package (GdR SoC-SiP’08)

This paper is an overview of failure recovery schemes we developed for Network-on-Chip (NoC) systems, based on a checkpointing and rollback method. These recovery schemes permit enhancing the system fault tolerance capabilities at the OS/application level. We analyze the particularities and effectiveness/cost of these fault tolerant approaches, considering different NoC sizes, application traffic loads and expected failure rates.