< back to publications

Early System Reliability Analysis for Cross-layer Soft Errors Resilience in Memory Arrays of Microprocessor Systems

Author(s): A. Vallero, A. Savino, A. Chatzidimitriou, M. Kaliorakis, M. Kooli, V. Riera, G. Di Natale, A. Bosio, R. Canal, D. Gizopoulos, S. Di Carlo

Journal: IEEE Transactions on Computers

Volume: 68

Issue: 5

Pages: 765-783

Doi : 10.1109/TC.2018.2887225

Cross-layer reliability is becoming the preferred solution when reliability is a concern in the design of a microprocessor-based system. Nevertheless, deciding how to distribute the error management across the different layers of the system is a very complex task that requires the support of dedicated frameworks for cross-layer reliability analysis. This paper proposes SyRA, a system-level cross-layer early reliability analysis framework for radiation induced soft errors in memory arrays of microprocessor-based systems. The framework exploits a multi-level hybrid Bayesian model to describe the target system and takes advantage of Bayesian inference to estimate different reliability metrics. SyRA implements several mechanisms and features to deal with the complexity of realistic models and implements a complete tool-chain that scales efficiently with the complexity of the system. The simulation time is significantly lower than micro-architecture level or RTL fault-i njection experiments with an accuracy high enough to take effective design decisions. To demonstrate the capability of SyRA, we analyzed the reliability of a set of microprocessor-based systems characterized by different microprocessor architectures (i.e., Intel x86, ARM Cortex-A15, ARM Cortex-A9) running both the Linux operating system or bare metal in the presence of single bit upsets caused by radiation induced soft errors. Each system under analysis executes different software workloads both from benchmark suites and from real applications.