xmap: Transparent, Hugepage-Driven Heap Extension over Fast Storage Devices

Poster

EUPEX partner FORTH had a research poster accepted at EuroSys’2024 in Athens, Greece.

Abstract

Increasing dataset sizes requires that applications use larger heaps. Two characteristic examples are graph analytics and machine learning (ML), both emerging workloads with rapidly
increasing memory demands. With recent technology limitations in DRAM scaling, one important question is how to grow the heap further. One approach is the use of specialized out-of-core frameworks which can process datasets larger than the available DRAM capacity. However, developing and tuning such frameworks for performance is both complex and time-consuming, and frameworks for graph analytics and ML have traditionally been in-memory. Another approach for extending the application heap transparently to applications is fast, block-addressable storage devices, such as NVMe SSDs, over the OS memory mapped I/O (mmio) or swapper (swap) path. These paths allow the extension of the application heap without application modifications through the abstraction of the process virtual address space.

Although both mmio and swap are part of the core kernel operation, they have significant limitations for use with application heaps. First, they fail to scale beyond 8 cores due to shared structure contention in OS paths stressed during heap extension, namely the page fault, page reclamation and page writeback paths. Second, they do not offer read-write
support for hugepages over block storage. Hugepages are contiguous memory frames larger than the regular page size (e.g., 2MB or 1GB on x86_64), which can be mapped with
a single TLB and (huge) page table entry. Hugepages have received increased attention in in-memory setups for their potential to reduce CPU cycles spent on TLB misses, and
reduce page faults, thus reducing kernel processing software overheads. Applying hugepages to heap extension can aid in harnessing the increasing throughput capabilities of NVMe
SSDs. Third, they offer limited asynchronous operations, which are important to mask the higher latency of the backing storage medium, namely aggressive readahead, which is
ill-fitted for heap extension, where memory must be treated as a scarce resource.

Driven by these limitations, we design xmap as an alternative mmio path for the Linux kernel, tailored towards heap extension over fast block storage devices. xmap provides improved scalability, support for transparent hugepages over block-based storage, and asynchronous hugepage promotions. To our knowledge, xmap is the first system that provides this support for the Linux kernel. xmap is implemented as a kernel module, requiring no kernel modifications. It utilizes its own preallocated and statically divided regular and hugepage pools, to operate under a strict DRAM budget and prevent external memory fragmentation due to the use of hugepages. It organizes pages in its own sharded page cache and metadata structures, to mitigate shared structure contention and improve scalability. xmap utilizes hugepages either via explicit application demand (i.e., madvise) or transparently in a policy-driven manner. xmap supports both the synchronous preparation and mapping of hugepages to the application address space at page fault time, and the asynchronous promotion of hugepage-sized virtual address ranges from mapping to regular pages to mapping to a hugepage.

We first examine the performance implications of extending the application heap over NVMe SSDs, by characterizing the application-perceived memory access latency and throughput through LMbench workloads. We then evaluate xmap with a random page fault microbenchmark, a machine learning workload (LIBLINEAR), and graph processing frameworks (Ligra+, GridGraph), by transparently extending their heap over storage without any code modifications. We find that xmap scales beyond 8 cores, increasing random page fault throughput by up to 3.9× compared to Linux mmap. When using a heap/DRAM ratio of 2×, xmap improves linear regression training time by up to 7.9× compared to Linux mmap. xmap allows an in-memory graph processing framework (Ligra+) to perform comparably to or even better than a hand-written out-of-memory graph processing framework (GridGraph) when processing graphs 4×-8× larger than the available DRAM, without any code modifications.

Authors: Ioannis Malliotakis (ICS-FORTH & University of Crete), Anastasios Papagiannis (Isovalent), Manolis Marazakis (ICS-FORTH), Angelos Bilas (ICS-FORTH)

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wordpress_test_cookie	session	This cookie is used to check if the cookies are enabled on the users' browser.

xmap: Transparent, Hugepage-Driven Heap Extension over Fast Storage Devices

Poster

Abstract

Type

Publication

Link