

# Preparing flagship EU codes for the sovereign Rhea CPU

Feedback from CoEs to foster co-design with EUPEX

antoine.morvan@eviden.com

2026.01.28 – HiPEAC'26 - Krakow

 EUPEX



**EuroHPC**  
Joint Undertaking

This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 101033975. The JU receives support from the European Union's Horizon 2020 research and innovation programme and France, Germany, Italy, Greece, United Kingdom, Czech Republic, Croatia.



# Eviden – Introduction CEPP



- Center for Excellence in Performance Programming
- Accelerate workload, give value to simulation!



CEPP



| Institutions                                                              | IT Partners                                   | Customers                                                     | R&D                                                    |
|---------------------------------------------------------------------------|-----------------------------------------------|---------------------------------------------------------------|--------------------------------------------------------|
| Science research funded projects where CEPP does HPC services & co-design | Focus on key IT partners in HPC, AI & Quantum | Taylor-made solutions on customer KEY applications and topics | Co-design and co-creation between R&D for applications |

# Overview – CoEs Preparing Codes for Rhea



DRIVING THE EXASCALE TRANSITION



ChEESE

+ TREX  
+ PEPSC  
etc.



Technology Providers  
and Integrators



Scientists and  
Academics



Super Computing Centers



## RHEA 1



Arm Neoverse V1 Platform  
A revolution in high performance computing

Arm's highest-performance core

8-wide

5-8-wide

15-wide

2x SVE, 4x NEON

3x Ld, 2x St\*

4x ALU, 2x BR

Fetch

Decode/Rename

Issue

Vector Execute

Load/Store

Integer Execute

On Arm's most capable platform

High Bandwidth Memory

HBM3 + DDR5

Custom Accelerators

HBM3 + DDR5

CCIX, multi-die/socket

PCIe-Gen5, CXL

Flexible IO & Multi-Chip

Prepare codes for Arm  
Assess SVE & HBM

# SPACE Overview



## ➤ Scalable Parallel Astrophysical Codes for Exascale

Codes:

- Pluto
- OpenGADGET
- iPIC3D
- RAMSES
- BHAC
- FIL / GRACE
- ChaNGa



# ChEESE Overview



## ChEESE

### ChEESE covers 3 approaches to exascale

- Capability Computing: solving complex problems that typically require parameterization due to the limitations of current hardware.
- Capacity Computing: solving multiple individual problems that can be managed by petascale-range machines, but when combined, they form an exascale workflow (data inversion, data assimilation, and uncertainty quantification)
- Urgent Computing: solving capability/capacity problems under strict time constraints (ex: emergency situations)



### Consortium Composition



### Domains:

- Seismic Wave Propagation
- Volcanology
- Geodynamics
- Tsunami Modeling
- Coupling Physical Processes
- Fluid Dynamics and Planetary Atmospheres

### Codes:

- SeisSol
- SPECFEM3D
- ExaHyPE
- Tandem
- xSHELLS
- Tsunami-HySEA
- FALL3D
- OpenPDAC
- LaMEM
- pTatin3D
- ELMER/ICE

# MaX3 Overview

# MaX

DRIVING THE EXASCALE TRANSITION

# MaX

LIGHTHOUSE  
CODES



DOMAIN EXPERTS  
& CODE DEVELOPERS



HPC EXPERTS  
& DATA CENTRES



TECHNOLOGY &  
CO-DESIGN PARTNERS



European leadership  
in Exascale  
Applications in the  
**Materials Domain**



Key scientific and  
industrial applications  
and societal challenges  
for MAX impact

Building a stronger  
European HPC  
ecosystem

Improving the access  
to the MAX computing  
applications and their  
performance data



EUPEX

# ESIWACE3 Overview

Coordinated



Barcelona  
Supercomputing  
Center  
Centro Nacional de Supercomputación



Weather forecast  
Climate research



High-performance computing



Software engineering



Training and teaching



Communicating academic research



Soon: hackathon to prepare code  
exploitation on Jupiter



Codes:

- IFS Dwarfs
- ecRad
- ecTrans
- CloudSC
- NEMO
- ICON

# Preparation for Rhea – 2 objectives

- Arm CPU (Neoverse V1) with SVE vector instructions (256 bits)
- HBM memory on the package

RHEA 1



# Methodology – Arm + SVE

## › Application porting and validation

- Possible thanks to in-kind systems
  - Fujitsu A64FX for early developments
  - AWS Graviton 3 for iso-core (Neoverse V1) analysis
  - Wider availability with Nvidia Grace (SDV at Eviden for instance)
- Evaluate several toolchains when possible; port when needed
- Early performance assessments on these Rhea “alternatives”

## › EAP (Early Access Program) will open a new system (cf. last slide)

## › EUPEX



# Results – Arm + SVE

## Effort to port to Arm

- Bring code out of “x86 dogma”
- Port libraries
- Fix build systems

## Today, better

- Compilers (LLVM, ACFL)
  - Outer loop vectorization (BSC + SiPearl @ EPI)
- Libraries (ArmPL, NVPL, etc.)
- Profiling tools (MAQAO)



| Code        | GCC     | LLVM |
|-------------|---------|------|
| gPLUTO      | OK      | OK   |
| OpenGadget3 | OK      | OK   |
| iPIC3D      | OK      | OK   |
| RAMSES      | OK      | OK   |
| BHAC        | OK      | KO   |
| FIL         | partial | KO   |
| ChaNGa      | OK      | OK   |

| MaX Code | ARM port |
|----------|----------|
|          |          |
| QE       | ✓        |
| Yambo    | ✓        |
| BigDFT   | ✓        |
| FLEUR    | ✓        |
| Siesta   | ✓        |

| Application   | Arm Support |
|---------------|-------------|
| XSHELLS       | gnu/llvm    |
| Tsunami-HySEA | adaptiveCPP |
| Tandem        | gnu         |
| SPECFEM3D     | gnu         |
| SeisSol       | gnu         |
| pTatin3d      | gnu         |
| FALL3D        | gnu         |
| ExaHyPE       | gnu/nvhpc   |



DRIVING THE EXASCALE TRANSITION



# Methodology – HBM

## ➤ Leverage Intel Xeon Max (Sapphire Rapids with HBM)

- The only CPU available with DDR & on package HBM2e
- Methodology based on binding tools: **hwloc** prepared in WP5 for Rhea



# Results – HBM

| Application   | HBM gain vs DDR5 |
|---------------|------------------|
| ExaHyPE       | 0%               |
| Fall3D        | 17%              |
| pTatin3D      | 22%              |
| SeisSol       | 8%               |
| SPECFEM3D     | 44%              |
| Tandem        | 45%              |
| Tsunami-HySEA | 13%              |
| XSHELLS       | 17%              |



- Many application benefit greatly from HBM, up to 80% vs DDR5
  - But depends on the application: some show no gain at all



**SPaCE**

| Code        | DDR (s) | HBM (s) | Gain(+) / Loss(-) |
|-------------|---------|---------|-------------------|
| gPLUTO      | 313     | 235     | +25%              |
| OpenGadget3 | 285     | 279     | +2%               |
| iPIC3D      | 219     | 186     | +15%              |
| RAMSES      | 324     | 317     | +2%               |
| BHAC        | 163     | 133     | +18%              |
| FIL         | 244     | 239     | +2%               |
| ChaNGa      | 321     | 312     | +3%               |

| MaX Code<br>(* = Mini-App) | Maximum HBM Gain<br>observed over DDR [%] |
|----------------------------|-------------------------------------------|
| QE*                        | 35                                        |
| Yambo*                     | 80                                        |
| BigDFT                     | 14                                        |
| FLEUR                      | 33                                        |
| Siesta                     | 49                                        |



DRIVING THE EXASCALE TRANSITION



# Results – Domain Specific Benchmark



HPCW V3.0  
Open source release  
August 2025



- Codesign vehicle for Weather & Climate
- Relevant, realistic, near-operational workloads
- <https://hpcw.gitlab-pages.dkrz.de/hpcw/>

Collaborations outside Europe



- New codes: NICAM-DC & WRF Dwarf

- The 3 IFS Dwarves compiled (LLVM's Flang) with auto vectorization for RVV (RISC-V Vector Extension)
- Small testcases from HPCW run successfully on Banana Pi BPI-F3 (RVV 256 bits)
- OpenMP is still Work In Progress

# Feedbacks on the Effort

- After many years focused on the embedded market, Arm CPUs are now competitive in HPC
  - Several contenders: AWS Graviton 3/4, Nvidia Grace, SiPearl Rhea
- EUPEX led the way for CoEs to port on Arm
  - “x86 dogma” took its roots deeper than expected: source code, but also libs and build systems
  - Vectorizing for SVE requires the same effort as vectorizing for AVX
- European flagship codes benefit from HBM ; results depend on the application memory patterns
  - HBM cost remains high, tradeoffs to consider
  - i No CPU with HBM planned in the roadmaps
- Codes are ready for running on the first European CPU tailored for HPC
  - Also ready for the first exploitation on Jupiter: the first exascale cluster in Europe runs on Arm
- Domain Specific Benchmark (DSB) representing HPC requirements from scientific flagship code developers





# What's Next ?

- EAP: new system to prepare for Arm CPUs
  - CDV (Community Development Vehicle)
    - 8 nodes with Grace-Grace, 240 GB LPDDR5X
  - Access to the EUPEX software stack
  - Available soon to CoEs & EU Projects. **Want to know more?**
- Better characterize energy: need for homogeneous measures
- Continue efforts toward European sovereignty

Contact : [eap@eupex.eu](mailto:eap@eupex.eu)



*Find this person in the room (or around)*



# Tell Us If Early Access to a Cluster Matters to You

➤ Quick Poll :



# Thank you for your attention

Questions ?

antoine.morvan@eviden.com

2026.01.28 – HiPEAC'26 - Krakow

# EUPEX



DRIVING THE EXASCALE TRANSITION



**EuroHPC**  
Joint Undertaking

This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 101033975. The JU receives support from the European Union's Horizon 2020 research and innovation programme and France, Germany, Italy, Greece, United Kingdom, Czech Republic, Croatia.

