Enabling RDMA and GPUs in Rootless Kubernetes for Accelerated HPC and AI Applications
Lise Jolicoeur, Vanessa Sochat, François Diakhaté, Daniel Milroy. "Enabling RDMA and GPUs in Rootless Kubernetes for Accelerated HPC and AI applications"
Lise Jolicoeur, Vanessa Sochat, François Diakhaté, Daniel Milroy. "Enabling RDMA and GPUs in Rootless Kubernetes for Accelerated HPC and AI applications"
Lise Jolicoeur, François Diakhaté, and Raymond Namyst. “Leveraging Private Container Networks for Increased User Isolation and Flexibility on HPC Clusters”. en. In: High Performance Computing. ISC High Performance 2024 International Workshops. Ed. by Michèle Weiland et al. Cham: Springer Nature Switzerland, 2025, pp. 415–426. isbn: 978-3-031-73716-9. doi: 10.1007/978-3-031-73716- 9_29.
Published in 20th Workshop on Virtualization in High-Performance Cloud Computing (VHPC'25), 2025
HPC workflows increasingly integrate diverse technologies such asartificial intelligence (AI), machine learning (ML), data analytics,databases, and web services. Orchestrators like Kubernetes have beendesigned to facilitate deploying these heterogeneous workloads in cloudenvironments. Allowing Kubernetes to be launched and managed as a resourceon HPC clusters would facilitate the deployment of modern workflows in HPCenvironments. To enable the deployment of Kubernetes by unprivileged HPCusers, we evaluate the usability of a rootless version of Kubernetes,Usernetes. We analyze synthetic benchmarks as well as HPC and ML proxy appsto evaluate the overhead of Usernetes for HPC/ML workloads deployed on highperformance networks and GPUs. While the results show that applicationsrunning in Usernetes can take advantage of InfiniBand networks and NVIDAGPUs, some benchmarks incur measurable overheads at scale which warrantfurther investigation.
Recommended citation: Lise Jolicoeur, Vanessa Sochat, François Diakhaté, Daniel Milroy. "Enabling RDMA and GPUs in Rootless Kubernetes for Accelerated HPC and AI applications"
Published in High Performance Computing. ISC High Performance 2024 International Workshops, 2024
To address the increasing complexity of modern scientific computing workflows, HPC clusters must be able to accommodate a wider range of workloads without compromising their efficiency in processing batches of highly parallel jobs. Cloud computing providers have a long history of leveraging all forms of virtualization to let their clients easily and securely deploy complex distributed applications and similar capabilities are now expected from HPC facilities.
Recommended citation: Lise Jolicoeur, François Diakhaté, and Raymond Namyst. “Leveraging Private Container Networks for Increased User Isolation and Flexibility on HPC Clusters”. en. In: High Performance Computing. ISC High Performance 2024 International Workshops. Ed. by Michèle Weiland et al. Cham: Springer Nature Switzerland, 2025, pp. 415–426. isbn: 978-3-031-73716-9. doi: 10.1007/978-3-031-73716- 9_29.
Download Paper