Providing a set of tools to aid application developers as well as system operators to optimise the efficiency with respect to performance and energy, i.e., to maximise system utilisation.
COUNTDOWN is a methodology and a tool for identifying and automatically reducing the frequency of the computing elements in order to save energy during communication and synchronization primitives. COUNTDOWN is able to filter out phases which would detriment the time to solution of the application transparently to the user, without touching the application code nor requiring recompilation of the application. Besides its primary use as an energy-saving framework, COUNTDOWN can be a powerful monitoring tool as it allows us to track and record low-level commands of the parallel application. This then allows a granular monitoring and performance analysis of the application running on specific hardware.
In EUPEX, COUNTDOWN will have two principal uses:
Responsible partner: CINI – University of Bologna
The Score-P instrumentation and measurement infrastructure is a highly scalable and easy-to-use tool suite for profiling and event tracing of HPC applications written in C/C++, Fortran or Python. It supports a wide range of HPC platforms and programming models (MPI, SHMEM, OpenMP, Pthreads, CUDA, HIP, OpenCL, Kokkos). Score-P provides core measurement services for a range of specialized analysis tools, such as Vampir, Scalasca, and TAU. For measurement, the instrumented program can be configured to record an event trace in OTF2 format or produce a call-path profile in CUBE4 format. Optionally, PAPI or Linux Perf hardware counters can be recorded. Filtering techniques allow precise control over the amount of data to be collected. Score-P is available under a BSD 3-Clause license.
Score-P will be adapted to the EUPEX Hardware- and Software platform, especially it will need to be adapted to the compiler versions and instrumentation interfaces, OpenMP versions and runtimes, MPI versions, and GPU programming interfaces of the system. It will be integrated in the overall EUPEX software stack.
Responsible partner: Jülich
Scalasca supports the performance optimization of parallel programs by measuring and analysing their runtime behaviour. The tool has been specifically designed for use on large-scale systems, but is also well suited for small-and medium-scale HPC platforms. The analysis identifies potential performance bottlenecks – in particular those concerning communication and synchronization – and offers guidance in exploring their causes. Scalasca is available under a BSD 3-Clause license. The user of Scalasca can choose between two different analysis modes: (i) performance overview on the call-path level via profiling and (ii) the analysis of wait-state formation via event tracing. Wait states often occur in the wake of load imbalance and are serious obstacles to achieving satisfactory performance. The latest versions also include a scalable critical path analysis and root-cause analysis. Performance-analysis results are presented to the user in an interactive explorer called Cube that allows the investigation of the performance behaviour on different levels of granularity along the dimensions metric, call path, and process. For instrumenting and measurement, Scalasca leverages the community-driven instrumentation and measurement infrastructure Score-P.
Scalasca will be adapted to the EUPEX Hardware- and Software platform, especially it will need to be adapted to the compiler versions and instrumentation interfaces, OpenMP versions and runtimes, MPI versions, and GPU programming interfaces of the system. It will be integrated in the overall EUPEX software stack.
Responsible partner: Jülich
The open-source runtime system MERIC is designed to minimize the energy consumption of the HPC infrastructure executing a parallel application by dynamic tuning a wide range of hardware knobs. The idea of dynamic tuning comes from the Horizon 2020 project READEX under which the development of the MERIC has started. The MERIC supports tuning of CPUs and GPUs of various vendors, as well as several power monitoring solutions. The library and associated tools perform a detailed analysis of complex application behaviour, identification of the optimal hardware settings concerning energy consumption and runtime, and dynamic tuning during the application runtime. The output of an application analysis is possible to visualize using the RADAR visualizer.
The MERIC runtime system will be adapted to the EUPEX hardware platform to provide its parameters tuning and resources consumption monitoring to evaluate and improve energy-efficiency of the parallel applications executed on the EUPEX pilot.
Responsible partner: IT4Innovations