Feb 25 – 27, 2026
Technical University of Braunschweig
Europe/Berlin timezone

Improving the Efficiency of Kokkos Multi-Dimensional Range Policy for GPUs

Feb 25, 2026, 4:30 PM
30m
SN 20.2 (Technical University of Braunschweig)

SN 20.2

Technical University of Braunschweig

Developer Talk Kokkos Kokkos II

Speaker

Adrien Taberner (CEA)

Description

Kokkos MDRangePolicy provides a high-level abstraction for iterating over multi-dimensional index spaces. Used with parallel_for and parallel_reduce constructs, it enables computations over N-dimensional spaces (up to 6 dimensions). MDRangePolicy is the most intuitive and commonly used approach for iterating over multi-dimensional arrays and implementing stencil computations in scientific applications. As Kokkos adoption continues to grow, optimizing this core functionality directly benefits a large portion of the user community.

This presentation covers the current performance limitations of MDRangePolicy and explores the default tiling strategies employed within Kokkos for device backends (CUDA, HIP, etc.). Our investigation identified several areas for improvement: suboptimal default block sizes, complicated code paths, and excessive register pressure, which lead to occupancy limitations on GPUs. We present ongoing work to enhance the MDRangePolicy implementation, focusing on reducing register pressure, optimizing default block sizes, and improving overall GPU performance.

We will present profiling reports and benchmark results comparing current and improved implementations across various GPU architectures, demonstrating measurable performance gains (from 1.1x to 2x speedup). Importantly, all improvements maintain full compatibility, so existing user code requires no modifications to benefit from the enhanced performance. Beyond performance improvements, this work has provided opportunities for code refactoring, simplification, and modernization of the Kokkos codebase.

Finally, we discuss the tradeoffs between maintaining high-level portable abstractions and addressing low-level performance concerns.

Author

Presentation materials

There are no materials yet.