Data Management for Scientific Simulations on Post-Petascale Supercomputers

Organized by: 

Arnaud Legrand


Matthieu Dorier

- Dans le Grand Amphi

Million-core supercomputers have become a reality in 2012 with LLNL’s Sequoia supercomputer. Following Moore’s law, exascale machines are expected by 2018. Such an immense computational power is used in many research areas, including earth sciences, biology, climate, or cosmology, where large-scale simulations are conducted to better understand the physical phenomena that surround us. But larger simulations on larger machines lead to the production of larger amounts of data. This data needs to be efficiently stored and processed in order to retrieve scientific insights. This presentation focuses on Damaris, a new approaches to data management for post-petascale supercomputers. Damaris leverages the multicore nature of recent machines to offload data management tasks into dedicated cores. We study in particular how Damaris can be used to hide the impact of I/O (Input/Output) and improve performance, and how it can provide in situ visualization capabilities to simulations in a way that does not impact them. We dive further into the challenges of in situ visualization by presenting ``smart’’ in situ visualization, which attempts to detect potentially interesting features in the datasets to improve performance. Finally, we evaluate the energy/performance tradeoff of different data management approaches in the context of Damaris.

This work was conducted in the context of the Joint Laboratory for Petascale Computing, a collaboration between Inria, the University of Illinois at Urbana-Champaign, and Argonne National Laboratory, and in the context of the Data@Exascale associate team between the KerData IRISA/Inria Rennes team and Argonne National Laboratory.