Impact of Cache Coherence on the Performance of Shared-Memory based MPI Primitives: A Case Study for Broadcast on Intel Xeon Scalable Processors

Research Paper

EUPEX consortium partner FORTH had a research paper accepted at ICPP’2023.

Recent processor advances have made feasible HPC nodes with high core counts, capable of hosting tens or even, hundreds of processes. Therefore, designing MPI collective operations at the intra-node level has received significant attention over the past years. Deriving efficient algorithms for modern HPC nodes, with complex internal topologies and memory hierarchies, is challenging. Moreover, the cache coherency protocol, and its impact on performance, further complicate algorithm design for MPI collectives. This latter concern is often only partially addressed.

In this work, we demonstrate a particularly challenging performance degradation scenario in the case of shared-memory–based MPI broadcast, on three generations of the Intel Xeon Scalable processor architecture. Based on analysis of hardware-based performance counters, we conclude that the performance degradation observed is attributed to the cache coherency protocol and the multi-socket configuration of the execution platforms examined. We present a number of novel approaches designed to amend this effect, and apply them in a cache coherency aware version of the MPI broadcast implementation. We reduce the overall latency of the broadcast operation by up to 1.5 × and 1.25 × for small and large messages, respectively.

Authors: George Katevenis, Manolis Ploumidis, Manolis Marazakis

DOI: 10.1145/3605573.3605616

Collection of computationtal artifacts (source code, scripts, datasets, instructions) for reproducibility of experiments featured in the associated paper: https://zenodo.org/records/8094307

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wordpress_test_cookie	session	This cookie is used to check if the cookies are enabled on the users' browser.

Impact of Cache Coherence on the Performance of Shared-Memory based MPI Primitives: A Case Study for Broadcast on Intel Xeon Scalable Processors

Research Paper

Type

Publication

Link