Set Up Slurm on Kubernetes
This guide walks through setting up Managed Slurm as a pure Kubernetes add-on on an existing Crusoe Managed Kubernetes (CMK) cluster. Use this approach if you want full control over the underlying CMK cluster, want to manage Slurm components alongside other Kubernetes workloads, or already have GitOps/Helm-based infrastructure tooling that you want to extend to Slurm.
If you'd rather have Crusoe provision and manage everything end-to-end through a single command, see the Quickstart instead — it uses the crusoe slurm CLI to create the cluster, login nodes, and worker pools as a managed bundle. The two paths converge to the same end state; this guide just gives you direct kubectl/Helm access at every step.
In this mode you create the CMK cluster and your GPU compute node pools yourself, and all Slurm configuration lives in Kubernetes Custom Resources. You do not use the crusoe slurm clusters create or crusoe slurm nodesets create commands. CSO provisions the controller and login node pools automatically as part of SlurmCluster reconciliation.
How It Works
The Crusoe Slurm Operator (CSO) runs inside your CMK cluster and reconciles a SlurmCluster Custom Resource. When you apply the CR, CSO automatically provisions and configures:
- The controller node pool (3 ×
c1a.4xfor HA — managed by CSO, not user-configurable) - The login node pool (size and instance type configurable in the CR)
- The Slinky operator (Slurm-on-Kubernetes from SchedMD)
- The Slurm controller (
slurmctld) and login pods - Topograph for topology-aware scheduling
- The shared
/homePersistentVolumeClaim - A LoadBalancer service for SSH access to login nodes
For compute workers, you create the GPU node pools yourself — CSO can't infer hardware type or scale automatically. The operator then watches for Kubernetes nodes labeled slurm.crusoe.ai/compute-node-type=true and automatically creates Slinky NodeSets for each underlying node pool.
Users and groups are managed via SlurmUser and SlurmUserGroup CRs, the same as in the CLI/API path. See User Management for the full reference.
Prerequisites
- The Crusoe CLI (latest version) installed and authenticated, used here only for CMK cluster and nodepool creation
kubectlinstalledhelmv3.11+ installed- An SSH public key for accessing login nodes
Step 1: Create the CMK Cluster
Create a Managed Kubernetes cluster with the Slurm-supporting add-ons enabled. The required add-ons are:
crusoe_managed_slurm— installs the Crusoe Slurm Operator (CSO) on the clusternvidia_gpu_operator— exposes GPUs as schedulable resourcesnvidia_network_operator— enables InfiniBand networkingcrusoe_csi— provides the shared/homefilesystemautoclusters— automatic hardware remediation (recommended for production)
crusoe kubernetes clusters create \
--name my-slurm-cluster \
--location us-east1-a \
--add-ons crusoe_managed_slurm,nvidia_gpu_operator,nvidia_network_operator,crusoe_csi,autoclusters
Including crusoe_managed_slurm in --add-ons installs CSO automatically — you do not need to install it separately via Helm. CSO in turn installs cert-manager, the Slinky operator, Topograph, and the Crusoe Load Balancer Controller as it reconciles your SlurmCluster CR in Step 3.
Once the cluster is RUNNING, configure kubectl:
crusoe kubernetes clusters get-credentials my-slurm-cluster
kubectl cluster-info
Step 2: Create the slurm Namespace
All Slurm Custom Resources live in a single namespace. Only one SlurmCluster CR is supported per namespace.
kubectl create namespace slurm
Step 3: Apply the SlurmCluster Custom Resource
Applying a SlurmCluster CR triggers CSO to install the full Slurm stack — cert-manager, the Slinky operator, Topograph, the Crusoe Load Balancer Controller, the Slurm controller, login pods, and the shared /home volume. The controller node pool (3 c1a.4x nodes), login node pool (configurable in CRD below), and shared /home volume (configurable in CRD below) are provisioned automatically as part of this step.
Create a file named slurm-cluster.yaml:
apiVersion: slurm.crusoe.ai/v1alpha1
kind: SlurmCluster
metadata:
name: my-slurm-cluster
namespace: slurm
spec:
clusterVersion: "25.11.2-cmk.12"
loginSet:
replicas: 2
instanceType: c1a.8x
# Optional. CIDR ranges allowed to reach the login LoadBalancer.
# Defaults to 0.0.0.0/0 if omitted.
firewallRuleSourceRanges:
- 0.0.0.0/0
rootSSHPubKeys:
- "ssh-ed25519 AAAA... user@host"
# Shared /home volume. Default 10Ti. Can be increased later but never decreased.
homeVolumeSize: "10Ti"
Apply it:
kubectl apply -f slurm-cluster.yaml
Spec Reference
| Field | Required | Default | Notes |
|---|---|---|---|
clusterVersion | Yes | — | Slurm version to install (e.g. 25.05) |
loginSet.replicas | Yes | — | Number of login pods |
loginSet.instanceType | No | c1a.8x | Login node instance type. Immutable after creation. |
loginSet.firewallRuleSourceRanges | No | 0.0.0.0/0 | CIDR ranges allowed to reach the login LoadBalancer service |
rootSSHPubKeys | Yes | — | SSH public keys authorized for the root user on login and worker nodes |
homeVolumeSize | No | 10Ti | Shared /home volume size. Format <n>Ti (e.g. 10Ti, 50Ti). Can be increased after creation but never decreased. |
Watching the Reconciliation
Cluster bring-up takes ~10 minutes once the node pools are ready. Watch the phases:
kubectl get slurmclusters -n slurm -w
The STATUS column moves through Provisioning → Installing → Ready. For detail on which component is currently being configured:
kubectl describe slurmcluster my-slurm-cluster -n slurm
The conditions list (CertManagerReady, SlinkyReady, TopographReady, ControllerReady, LoginReady, etc.) indicates progress.
Once the cluster shows Ready, the LOGIN column on kubectl get slurmclusters displays 2/2 and ENDPOINT shows the external IP of the login LoadBalancer.
At this point you have a healthy Slurm control plane with login nodes, but no compute capacity yet. Add compute node pools next.
Step 4: Add Compute Node Pools
Compute pools provide the GPU capacity that runs your jobs. Unlike controller and login pools, compute pools are not auto-created by CSO — you create them yourself so you can choose the GPU type, count, and InfiniBand configuration. You can create multiple pools (for example, one per GPU type) and the operator will create a Slinky NodeSet for each one automatically. CPU node pools can also be created, but do not need the --ib-partition-id or --ephemeral-storage-for-containerd flags set.
Each compute pool needs the slurm.crusoe.ai/compute-node-type=true label and an InfiniBand partition ID:
crusoe kubernetes nodepools create \
--name slurm-h100-workers \
--cluster-name my-slurm-cluster \
--type h100-80gb-sxm-ib.8x \
--count 4 \
--ib-partition-id <your-ib-partition-id> \
--ephemeral-storage-for-containerd true \
--node-labels "slurm.crusoe.ai/compute-node-type=true"
CMK automatically applies crusoe.ai/nodepool.id and crusoe.ai/nodepool.name labels to every node based on the parent node pool. The Slurm operator uses the nodepool.id label to group compute nodes into NodeSets — one NodeSet per pool.
You can add or remove compute pools at any time. The operator detects new pools via the node labels and creates additional NodeSets without any CR changes on your part.
Verify Compute Nodes Were Discovered
The operator watches for compute nodes and creates a Slinky NodeSet for each unique crusoe.ai/nodepool.id:
kubectl get nodesets -n slurm
You should see one entry per compute node pool, with READY showing <n>/<n> once Slurm has registered all the nodes.
You can also verify the labels on your compute nodes directly:
kubectl get nodes -L slurm.crusoe.ai/compute-node-type,crusoe.ai/nodepool.id
Each compute node should show true for compute-node-type and a non-empty nodepool.id.
Step 5: Add Users (Optional)
To allow non-root users to SSH in and submit jobs, apply SlurmUser and SlurmUserGroup CRs. The full reference, including UID/GID assignment and POSIX naming rules, is in User Management. Brief example:
apiVersion: slurm.crusoe.ai/v1alpha1
kind: SlurmUser
metadata:
name: alice
namespace: slurm
spec:
clusterReference: my-slurm-cluster
fullName: "Alice Johnson"
sshPublicKeys:
- "ssh-ed25519 AAAA... alice@laptop"
kubectl apply -f user-alice.yaml
The user can SSH in within ~1 minute:
ssh -i <path-to-private-key> alice@<login-node-endpoint>
You can find the login node endpoint with:
kubectl get slurmcluster my-slurm-cluster -n slurm -o jsonpath='{.status.loginEndpoint}'
Step 6: Run a Job
Once SSH'd into a login node, standard Slurm commands work as usual:
sinfo # Show available nodes
srun --gpus=8 nvidia-smi # Quick interactive GPU test
sbatch my-job.batch # Submit a batch job
squeue # Check the queue
For a multi-node NCCL test and other examples, see Advanced: Kubernetes Operations — Running NCCL Tests.
Adding More Compute Pools
To add capacity, simply create another node pool with the compute label:
crusoe kubernetes nodepools create \
--name slurm-a100-workers \
--cluster-name my-slurm-cluster \
--type a100-80gb-sxm-ib.8x \
--count 4 \
--ib-partition-id <your-ib-partition-id> \
--node-labels "slurm.crusoe.ai/compute-node-type=true"
Within a few seconds, a new Slinky NodeSet appears in the slurm namespace and the new nodes register with slurmctld. Run sinfo from a login node to confirm.
Updating the Cluster
Most spec fields on SlurmCluster are immutable after creation. The supported updates are:
homeVolumeSize— increase only (PVC limitation)loginSet.replicas— scale up or down within the available login node countrootSSHPubKeys— propagates to running login and worker pods within ~1 minuteloginSet.firewallRuleSourceRanges— updates the LoadBalancer firewall
Apply changes with kubectl apply -f slurm-cluster.yaml or kubectl edit slurmcluster my-slurm-cluster -n slurm.
Deleting the Cluster
Delete the SlurmCluster CR. The operator will tear down all components and clean up the LoadBalancer:
kubectl delete slurmcluster my-slurm-cluster -n slurm
Deleting the SlurmCluster does not delete the shared /home volume. It must be manually deleted via the Crusoe Console or CLI.
When the SlurmCluster is deleted, CSO automatically removes the controller and login node pools it created. You then need to delete the compute node pools (or repurpose them for other workloads) and finally the CMK cluster itself:
crusoe kubernetes nodepools delete slurm-h100-workers --cluster-name my-slurm-cluster
crusoe kubernetes clusters delete my-slurm-cluster
Troubleshooting
Cluster stuck in Provisioning or Installing
Look at the conditions:
kubectl describe slurmcluster my-slurm-cluster -n slurm
Common causes:
LoginReady/ControllerReadystuck: CSO provisions these node pools automatically when you apply the SlurmCluster CR. If they're stuck, check that the underlying VM provisioning succeeded —crusoe kubernetes nodepools list --cluster-name my-slurm-clustershould show CSO-managed controller and login pools inRUNNINGstate. Quota or capacity issues at the project level are the most common reason.SlinkyReady/TopographReadyfailing: the operator's Helm install is failing. Check the operator pod logs:kubectl logs -n slurm deploy/crusoe-slurm-operator.CertManagerReadyfailing: cert-manager pods aren't healthy. Checkkubectl get pods -n cert-manager.
Compute nodes don't appear in sinfo
Verify the labels on your compute nodes:
kubectl get nodes -l slurm.crusoe.ai/compute-node-type=true \
-L crusoe.ai/nodepool.id,crusoe.ai/nodepool.name
Each compute node must have:
slurm.crusoe.ai/compute-node-type=true(set by you on the node pool)crusoe.ai/nodepool.id=<id>(auto-set by CMK)
If labels are missing, the operator's NodeReconciler will not create a NodeSet for them. Re-create the node pool with the correct --node-labels argument.
For a deeper diagnostic dump (recommended when escalating to support), run the slurm-debug tool — see the Advanced: Kubernetes Operations guide.
Next Steps
- User Management — Add users and groups, manage sudo access
- Managing Partitions — Create custom Slurm partitions
- Slurm Metrics — Monitor cluster and job performance
- Advanced: Kubernetes Operations — Direct CRD reference, prolog/epilog, AutoClusters / SIGTERM handling