Speaker
Description
Kokkos Comm is a lightweight C++ library providing performance-portable explicit communication primitives for distributed Kokkos applications. It aims to eliminate code duplication across the Kokkos ecosystem by centralizing solutions to common pain points. Kokkos Comm addresses critical integration challenges between the Kokkos execution model and distributed memory programming by automatically handling GPU awareness, non-contiguous data marshalling, and view lifetime management. It features a minimal asynchronous API that exposes a streamlined subset of the usual point-to-point and collective operations, while preserving flexibility and abstracting backend-specific complexity. The design is centered around simple, composable, and extensible interfaces, ensuring near-zero overhead compared to hand-rolled solutions. Currently, MPI and NCCL backends are supported, with other GPU-oriented (RCCL, oneCLL) and NIC-oriented (OFI, UCX, Portals) backends being explored. Built with C++20, Kokkos Comm maintains performance portability across complex heterogeneous systems (multi-GPU, multi-NIC) while serving as a research platform for advanced communication patterns, parallel programming models and standardization efforts. Kokkos Comm’s philosophy is simple: easy to use, hard to misuse.