25th IEEE International Symposium on On-Line Testing and Robust System Design
Hotel Rodos Palace, Rhodes Island, Greece
July 1-3, 2019

IOLTS 2019 Keynote Talk

Keynote Talk Title: "From Research to Product: RAS Features in EPYC and Radeon Instinct"

The explosive growth of cloud-scale computing is a defining feature of the modern era. The datacenters that power modern clouds contain hundreds of thousands of compute nodes, each powered by highly integrated and powerful processors and compute accelerators. Provisioning adequate levels of reliability and availability for these nodes is a technical challenge, as even rare events can occur frequently at such scale. Addressing this challenge requires significant investment in research and development to understand the events that occur at scale and to design an appropriate set of features to address them. This talk will discuss reliability research conducted by AMD and describe how the results of this research affected the architecture, design, and implementation of certain reliability, availability, and serviceability (RAS) features in AMD EPYC and Radeon Instinct products.

Keynote Speaker bio

Vilas Sridharan is currently an AMD Fellow in the RAS (Reliability, Availability, and Serviceability) Architecture group at AMD, Inc., where is the lead RAS architect for all of AMD’s products. His research focuses on the modeling of hardware faults and architectural and micro-architectural approaches to reliability and fault tolerance in high-performance microprocessors.

Vilas received his Ph.D. and M.S.E. from the Department of Electrical and Computer Engineering at Northeastern University, and his B.S.E. in Computer Engineering from Princeton University in 2000. From 2000 - 2004, he worked in the SPARC server division at Sun Microsystems. He has been at AMD since 2010 in a RAS Architecture role.

