The performance of processors used for scientific computing and machine learning is improving continuously. The exploitation of parallelism is a key factor for performance, but this approach is best suited to regularly organised data. Most physical calculations use sparse data, which is difficult to exploit in parallel. Moreover, the problem is critical because the data flow between the processor(s) and the memory is often the limiting factor for performance. It is therefore very important to be able to optimise memory accesses for sparse data, and this in a hardware way. The presence of hardware acceleration on recent GPGPUs confirms this trend. In this thesis, the candidate will first explore the state of the art of HPC nodes in terms of memory and cache subsystems, including an in-depth study of sparse matrix and data representations. The candidate will propose hardware architectural improvements, and analyse their impact on a conventional memory hierarchy. The improvements may focus on structured data acceleration, data compression or synchronisation. The proposals will be evaluated and the most interesting ones will be implemented with high-end programmable components and integrated into a specialised processor under development. The work will take place at the LSTA laboratory of CEA Grenoble, which has an important history in the development of systems and software for high performance computing.