Our partner the Goethe University Frankfurt am Main had a Research Poster accepted for the Research Poster session of ISC 23 in Hamburg.
Given the complexity of modern HPC systems, achieving theoretical peak performance depends on a myriad of parameters and system configurations. In order to optimize the system performance and efficiently use the underlying resources, various methods can be applied, including simulation, benchmarking, and monitoring. However, these methods and the tools are not compatible with each other and only consider a selection of performance factors such network, I/O, resource allocation, or parallel execution. Yet, each of these approaches generate knowledge that can be applied to similar problems or system configurations. To avoid that such knowledge is collected only for one-time purposes, and to also support other users, this knowledge must be easily accessible and available to the community. The MAWA-HPC (Modular and Automated Workload Analysis for HPC Systems) project aims to develop a generic workflow and tool suite that can be applied to different use cases and workloads from different science domains. Through its modular design, the workflow is able to support various community tools, increasing the compatibility of each tool and covering new use cases. In this poster, we present the high-level system design of MAWA-HPC and its current prototype implementation. By extending the prototype with the support for additional monitoring and profiling tools, node-level performance engineering tools, network benchmarks, and microbenchmarks for different parallel programming models, we also introduce a multi-dimensional Roofline model. Including time as a third dimension, the Roofline model can provide insight into an application’s performance over time, enabling the identification and understanding of performance anomalies.
Authors: Zhaobin Zhu, Niklas Bartelheimer, Sarah Neuwirth
DOI: 10.13140/RG.2.2.10671.92325