The SLS team focuses on (a) highly efficient architectures for general purpose computing or AI-dedicated algorithms, (b) system-level modeling and design methodology : specification, simulation and verification of hardware/software systems on chip; design exploration and synthesis of hardware. The work of the team is included in the Laboratory themes “Hardware/software codesign” and “Simulation and verification of systems” described below.

Hardware/software codesign

Our research on high performance general purpose processors explores the use of value prediction in processor design. We have shown that simple value prediction can be implemented by reusing existing pipeline structures, leading to increased performance with reduced overheads. We have also shown that predicting values enables the dynamic "reduction" of some instructions (e.g., transforming an add into a nop at runtime), which further improves performance. So far, we have considered value prediction for out-of-order microarchitectures only.
Multi-core and many-core architectures have evolved towards a set of clusters. Each cluster integrates a set of cores, a cache memory and a local memory shared by all the cores of the cluster. We have worked on hardware methods for distributing memory bank accesses in many-core architectures by experimenting on the MPPA Kalray processor. In addition, we have proposed an innovative hardware support for synchronization locks. This decentralized solution manages dynamic re-homing of locks in a dedicated memory, close to the latest access-granted core.
On the dedicated architecture side, the team is still working on Artificial Intelligence. After our work on highthroughput ternary neural network, we have collaborated with the university of Salerno on the design of a tiny Binary Neural Network for human recognition applications.
Through a collaboration with OVHcloud, we also worked on a dedidated IP used in their mitigation systems to face Distributed Denial-of-Service (DDOS) attacks. In this domain, hard- ware developement has to be agile. By introducing the Chisel hardware construction language in the hardware design flow, we showcase how Chisel unleashes the power of agile development methodologies through development iterations. We have also shown through a General Matrix Multiply implementation case study that Chisel can be used to generate highly parametrizable circuits, bringing huge benefits in design exploration, reuse and designer productivity.
Along with the ever-increasing hardware IP development pace, System-on-Chips (SoCs) now integrate a tremendous amount of IPs, each of them specifying more and more interface registers. This exacerbates the already difficult hardware/software integration challenge, which remains a challenge despite the decades that separate us from the first days of computers. We have proposed a generic interface to devices, using message conduit instead of registers and interrupt, taking inspiration from USB and virtualization strategies. This strategy has the benefit of partitioning the driver in a front-end, which is device and operating system dependent, meaning that few distinct front-ends will be required in practice, and a backend that is the responsibility of the device maker. Our prototypes demonstrate that it is suitable for small systems with low latency and low throughput devices (FPGA IP integration), such as high-performance devices in the Linux kernel or in hypervised systems.

Simulation and verification of systems

Modeling and simulation of cyber-physical devices is challenging because of their heterogeneity: discrete events simulation progresses by discrete timesteps while continuous time simulation does so in a time continuum. The SystemC AMS synchronization strategy is based on fixed timesteps and can generate inaccuracies overcame only at expanse of simulation speed. We have proposed a new continuous time and discrete events synchronization algorithm on top of the SystemC framework and have proven its causality, completeness and liveness. In addition, we have also proposed an adaptive algorithm to adjust the synchronization step to provide near to optimum simulation speed. Results on various cases studies have demonstrated that our algorithm circumvents these challenges, attains high accuracy with respect to established tools, and improves simulation speed. This work aims at enlarging the modeling and simulation capabilities of SystemC as a heterogeneous design tool.
Today’s SoCs require a complex design and verification process. In early design stages, high-level debugging of the SoC functionality is feasible on TLM (Transaction-Level Modeling) descriptions. To ease debugging of such SoC’s models, Assertion-Based Verification (ABV) en- ables the runtime verification of temporal properties. In the last design stages, RTL (Register Transfer Level) descriptions of hardware blocks expose microarchitectural details. To gain confi- dence in the validity of system level properties after this TLM-to-RTL synthesis, transaction level assertions must be reverifiable on RTL models. To address that issue, we propose refinement rules for the automatic system level to signal level transformation of PSL assertions (Property Specification Language, IEEE standard 1850).
Many scientific applications require higher accuracy than what can be represented on 64 bits of the floatingpoint IEEE 754 standard, and to that end make use of dedicated arbitrary precision software libraries such as MPRF. To reach a good performance/accuracy trade-off, developers use variable precision, requiring e.g. more accuracy as the computation progresses. Hardware accelerators for this kind of computations do not exist yet, and independently of the actual quality of the underlying arithmetic computations, defining the right instruction set ar- chitecture, memory representations, etc, for them is a challenging task. We have investigated the support for arbitrary and variable precision arithmetic in a dynamic binary translator (QEMU implementation), to help gain an insight of what such an accelerator could provide as an interface to compilers, and thus programmers. Through collaborations, we also worked on a FP represen- tation supporting both static and dynamically variable precision : by designing its compilation flow to hardware FP instructions or software libraries, and by demonstrating its performance, far better than the Boost programming interface for the MPFR library on the PolyBench suite.