EUPEX parter CINI (University of Bologna) had a poster accepted at the 19th ACM International Conference on Computing Frontiers, which was held from 17 to 19 May 2022 in Turin (Italy).
Abstract: Automated and data-driven methodologies are being introduced to assist system administrators in managing increasingly complex modern HPC systems. Anomaly detection (AD) is an integral part of improving the overall availability as it eases the system administrators’ burden and reduces the time between an anomaly and its resolution. This work improves upon the current state-of-the-art (SoA) AD model by considering temporal dependencies in the data and including long-short term memory cells in the architecture of the AD model. The proposed model is evaluated on a complete ten-month history of a Tier-0 system (Marconi100 from CINECA consisting of 985 nodes). The proposed model achieves an area under the curve (AUC) of 0.758, improving upon the state-of-the-art approach that achieves an AUC of 0.747.
Authors: Martin Molan, Andrea Borghesi, Luca Benini, Andrea Bartolini
DOI: 10.1145/3528416.3530867