Datum
2024-01-18Metadata
Zur Langanzeige
Aufsatz
On the Performance of Malleable APGAS Programs and Batch Job Schedulers
Zusammenfassung
Malleability—the ability for applications to dynamically adjust their resource allocations at runtime—presents great potential to enhance the efficiency and resource utilization of modern supercomputers. However, applications are rarely capable of growing and shrinking their number of nodes at runtime, and batch job schedulers provide only rudimentary support for such features. While numerous approaches have been proposed to enable application malleability, these typically focus on iterative computations and require complex code modifications. This amplifies the challenges for programmers, who already wrestle with the complexity of traditional MPI inter-node programming. Asynchronous Many-Task (AMT) programming presents a promising alternative. In AMT, computations are split into many fine-grained tasks, which are processed by workers. This makes transparent task relocation via the AMT runtime system possible, thus offering great potential for enabling efficient malleability. In this work, we propose an extension to an existing AMT system, namely APGAS for Java. We provide easyto-use malleability programming abstractions, requiring only minor application code additions from programmers. Runtime adjustments, such as process initialization and termination, are automatically managed by our malleability extension. We validate our malleability extension by adapting a load balancing library handling multiple benchmarks. We show that both shrinking and growing operations cost low execution time overhead. In addition, we demonstrate compatibility with potential batch job schedulers by developing a prototype batch job scheduler that supports malleable jobs. Through extensive realworld job batches execution on up to 32 nodes, involving rigid, moldable, and malleable programs, we evaluate the impact of deploying malleable APGAS applications on supercomputers. Exploiting scheduling algorithms, such as FCFS, Backfilling, Easy-Backfilling, and one exploiting malleable jobs, the experimental results highlight a significant improvement regarding several metrics for malleable jobs. We show a 13.09% makespan reduction (the time needed to schedule and execute all jobs), a 19.86% increase in node utilization, and a 3.61% decrease in job turnaround time (the time a job takes from its submission to completion) when using 100% malleable job in combination with our prototype batch job scheduler compared to the bestperforming scheduling algorithm with 100% rigid jobs.
Zitierform
In: SN Computer Science Volume 5 / Issue 4 (2024-01-18) eissn:2661-8907Förderhinweis
Gefördert im Rahmen des Projekts DEALZitieren
@article{doi:10.17170/kobra-202404109953,
author={Finnerty, Patrick and Posner, Jonas and Bürger, Janek and Takaoka, Leo and Kanzaki, Takuma},
title={On the Performance of Malleable APGAS Programs and Batch Job Schedulers},
journal={SN Computer Science},
year={2024}
}
0500 Oax 0501 Text $btxt$2rdacontent 0502 Computermedien $bc$2rdacarrier 1100 2024$n2024 1500 1/eng 2050 ##0##http://hdl.handle.net/123456789/15657 3000 Finnerty, Patrick 3010 Posner, Jonas 3010 Bürger, Janek 3010 Takaoka, Leo 3010 Kanzaki, Takuma 4000 On the Performance of Malleable APGAS Programs and Batch Job Schedulers / Finnerty, Patrick 4030 4060 Online-Ressource 4085 ##0##=u http://nbn-resolving.de/http://hdl.handle.net/123456789/15657=x R 4204 \$dAufsatz 4170 5550 {{Arbeitsplanung}} 5550 {{Formänderungsvermögen}} 5550 {{Flexibilität}} 5550 {{Laufzeitsystem}} 7136 ##0##http://hdl.handle.net/123456789/15657
2024-04-12T11:16:54Z 2024-04-12T11:16:54Z 2024-01-18 doi:10.17170/kobra-202404109953 http://hdl.handle.net/123456789/15657 Gefördert im Rahmen des Projekts DEAL eng Namensnennung 4.0 International http://creativecommons.org/licenses/by/4.0/ Malleable runtime system Malleable job scheduling APGAS Introduction 004 On the Performance of Malleable APGAS Programs and Batch Job Schedulers Aufsatz Malleability—the ability for applications to dynamically adjust their resource allocations at runtime—presents great potential to enhance the efficiency and resource utilization of modern supercomputers. However, applications are rarely capable of growing and shrinking their number of nodes at runtime, and batch job schedulers provide only rudimentary support for such features. While numerous approaches have been proposed to enable application malleability, these typically focus on iterative computations and require complex code modifications. This amplifies the challenges for programmers, who already wrestle with the complexity of traditional MPI inter-node programming. Asynchronous Many-Task (AMT) programming presents a promising alternative. In AMT, computations are split into many fine-grained tasks, which are processed by workers. This makes transparent task relocation via the AMT runtime system possible, thus offering great potential for enabling efficient malleability. In this work, we propose an extension to an existing AMT system, namely APGAS for Java. We provide easyto-use malleability programming abstractions, requiring only minor application code additions from programmers. Runtime adjustments, such as process initialization and termination, are automatically managed by our malleability extension. We validate our malleability extension by adapting a load balancing library handling multiple benchmarks. We show that both shrinking and growing operations cost low execution time overhead. In addition, we demonstrate compatibility with potential batch job schedulers by developing a prototype batch job scheduler that supports malleable jobs. Through extensive realworld job batches execution on up to 32 nodes, involving rigid, moldable, and malleable programs, we evaluate the impact of deploying malleable APGAS applications on supercomputers. Exploiting scheduling algorithms, such as FCFS, Backfilling, Easy-Backfilling, and one exploiting malleable jobs, the experimental results highlight a significant improvement regarding several metrics for malleable jobs. We show a 13.09% makespan reduction (the time needed to schedule and execute all jobs), a 19.86% increase in node utilization, and a 3.61% decrease in job turnaround time (the time a job takes from its submission to completion) when using 100% malleable job in combination with our prototype batch job scheduler compared to the bestperforming scheduling algorithm with 100% rigid jobs. open access Finnerty, Patrick Posner, Jonas Bürger, Janek Takaoka, Leo Kanzaki, Takuma doi:10.1007/s42979-024-02641-7 Arbeitsplanung Formänderungsvermögen Flexibilität Laufzeitsystem publishedVersion eissn:2661-8907 Issue 4 SN Computer Science Volume 5 false Article: 349
Die folgenden Lizenzbestimmungen sind mit dieser Ressource verbunden: