Design of a very low power Artificial Intelligence system (Tensor Processing Unit - TPU) based on memory computation.
In Memory Computing (IMC), Static Random Access Memory (SRAM), Tensor Processing Unit (TPU), Convolutional Neural Network (CNN), Non-Volatile Memory (NVM), Low power
Recent developments in our society have given rise to new fields of application (such as artificial intelligence-AI) which require extremely high IT efficiency with low resources (eg energy). The implementation of hardware neural networks is a hot topic in research and is now considered strategic for many companies. Indeed, the recent interest in deep neural networks for image recognition has put a new wave of interest in neuromorphic engineering and the deep learning sector is now dominated by a few industrial giants (Nvidia, Google, Intel ...). They typically rely on General-Purpose Graphics Processing Units (GPGPU) for the learning process, and on specialized hardware for low-power inference on integrated targets, with proven energy efficiency. In this context, Google has developed the Tensor Processing Unit (TPU), which is an integrated circuit specific to the application of the AI accelerator (ASIC), developed specifically for machine learning in neural networks, in particular using of the TensorFlow software. The neural algorithms considered today are essentially derived from two domains still separated: machine learning (used for example for data analysis) and neuroscience (which search a modelling of the brain function in a more realistic way). Leading projects in neuromorphic engineering, which have brought these two fields together, have led to the creation of powerful brain-inspired chips, such as TrueNorth or SpiNNaker. These technologies work well in centralized IT farms but exceed the power consumption needs of on-board systems. The development of these applications is stagnating due to the limitations of current IT architectures. As a result, the computing paradigm has evolved towards dedicated accelerators in innovative architectures. The shift from compute to memory (data centric computing) is an emerging computing paradigm that has shown (academically) huge potential in terms of overall computing efficiency. This computing paradigm is also called 'in-memory computing' (CIM).
The data-centric computing paradigm requires the design and implementation of dedicated hardware solutions for AI accelerators to cope with the large amount of data to be processed with minimal latency. Nevertheless, strict requirements on the non-functional characteristics of such implementations are necessary to have a usable and competitive product. More specifically, an AI accelerator core must meet the following needs: reusability and versatility, very low power consumption, high precision, low silicon area, while allowing parallel operations.
The data-centric computing paradigm requires the design and implementation of dedicated hardware solutions for AI accelerators to cope with the large amount of data to be processed with minimal latency. Nevertheless, strict requirements on the non-functional characteristics of such implementations are necessary to have a usable and competitive product. More specifically, an AI accelerator core must meet the following needs: reusability and versatility, very low power consumption, high precision, low silicon area, while allowing parallel operations.
Currently, all hardware AI accelerators in production (TPU, Nervana, DGX, Inferencia, etc.) or IPs available for integration into specialized circuits and systems use the conventional digital CMOS technology and design techniques. This is because this technology will still be the dominant hardware integration technology for a while, given its manufacturing cost and the control of industrial production, and the fact that several design tools exist for this.
However, in order to reduce power consumption (at least 2x) and latency (at least 3x), a totally different approach needs to be explored. In this context, the in-memory computing paradigm is a promising technique that minimizes data transport, the main performance bottleneck and energy cost of most data-intensive applications. This thesis will take advantage of the proven advantages of CIM to design an AI accelerator with very low consumption and will revolve around the following axes: the description of the target application, the mapping of the selected AI application with the CIM solution target, parallelization of the solution and implementation of the CIM-based accelerator.
Mis à jour le 8 February 2022