The modular design of the European Pilot for eXascale (EUPEX) platform has a strong impact on the architecture of the software ecosystem. The Modular Supercomputer Architecture (MSA) adds an additional level in the hierarchy of traditional supercomputing systems by introducing the concept of tightly-coupled modules. Each module, in turn, is a parallel computer with hardware characteristics satisfying the requirements of a specific type of application or parts thereof. The tight coupling is ensured by means of a federated high-speed network joining potentially different, module-specific fabrics.
This architectural approach comes with advantages compared to traditional designs and challenges regarding the whole software stack ranging from the management layer up to the application level. The notion of modules enables MSA systems to evolve over time, i. e., instead of disassembling a system after its lifetime, modules can be added and/or removed. However, this requires support of the management software providing the ability to insert or remove modules with minimal effort and impact on the system’s operation, i. e., large system downtimes have to be avoided. The execution environment has to be capable of providing an efficient access to all system resources to the applications. On the one hand, this requires the ability to execute parallel workloads transparently across multiple modules. On the other hand, it should also allow applications to leverage topological information on the MSA hierarchy for further optimisation. This requires close interaction with the runtime environment providing Application Programming Interfaces (APIs) for accessing this information.
Power management, energy efficiency, and performance optimisation are crucial to achieve Exascale performance while meeting a reasonable power budget. Therefore, tools for monitoring and profiling have to be adapted to the design of the EUPEX platform covering all levels ranging from the system level, via the node level through to the application level. Finally, the I/O system is also affected by the modular system design. Instead of a single, central storage, MSA system exhibit a multi-level storage hierarchy with different characteristics. The system software faces the challenge of providing fast and scalable access to data for heterogeneous workflows and applications while minimising data transfers across the different storage tiers.
This deliverable introduces the software ecosystem of the EUPEX platform (cf. Fig. 1) and details how the above-mentioned challenges are met. Chapter 2 presents components of the
management software stack. Chapter 3 introduces the fully MSA-aware execution environment enabling a flexible and efficient utilisation of the available resources within the EUPEX platform. Chapter 4 deals with tools for performance and energy efficiency that allow for monitoring and profiling the applications’ power/energy consumption and to determine performance bottlenecks. And finally, Chapter 5 presents the elements of the I/O system enabling fast and scalable access to data within the hierarchical storage system.