Set Up Slurm on Kubernetes

This guide walks through setting up Managed Slurm as a pure Kubernetes add-on on an existing Crusoe Managed Kubernetes (CMK) cluster. Use this approach if you want full control over the underlying CMK cluster, want to manage Slurm components alongside other Kubernetes workloads, or already have GitOps/Helm-based infrastructure tooling that you want to extend to Slurm.

If you'd rather have Crusoe provision and manage everything end-to-end through a single command, see the Quickstart instead — it uses the crusoe slurm CLI to create the cluster, login nodes, and worker pools as a managed bundle. The two paths converge to the same end state; this guide just gives you direct kubectl/Helm access at every step.

note

In this mode you create the CMK cluster and your GPU compute node pools yourself, and all Slurm configuration lives in Kubernetes Custom Resources. You do not use the crusoe slurm clusters create or crusoe slurm nodesets create commands. CSO provisions the controller and login node pools automatically as part of SlurmCluster reconciliation.

How It Works

The Crusoe Slurm Operator (CSO) runs inside your CMK cluster and reconciles a SlurmCluster Custom Resource. When you apply the CR, CSO automatically provisions and configures:

The controller node pool (3 × c1a.4x for HA — managed by CSO, not user-configurable)
The login node pool (size and instance type configurable in the CR)
The Slinky operator (Slurm-on-Kubernetes from SchedMD)
The Slurm controller (slurmctld) and login pods
Topograph for topology-aware scheduling
The shared /home PersistentVolumeClaim
A LoadBalancer service for SSH access to login nodes

For compute workers, you create the GPU node pools yourself — CSO can't infer hardware type or scale automatically. The operator then watches for Kubernetes nodes labeled slurm.crusoe.ai/compute-node-type=true and automatically creates Slinky NodeSets for each underlying node pool.

Users and groups are managed via SlurmUser and SlurmUserGroup CRs, the same as in the CLI/API path. See User Management for the full reference.

Prerequisites

The Crusoe CLI (latest version) installed and authenticated, used here only for CMK cluster and nodepool creation
kubectl installed
helm v3.11+ installed
An SSH public key for accessing login nodes

Step 1: Create the CMK Cluster

Create a Managed Kubernetes cluster with the Slurm-supporting add-ons enabled. The required add-ons are:

crusoe_managed_slurm — installs the Crusoe Slurm Operator (CSO) on the cluster
nvidia_gpu_operator — exposes GPUs as schedulable resources
nvidia_network_operator — enables InfiniBand networking
crusoe_csi — provides the shared /home filesystem
autoclusters — automatic hardware remediation (recommended for production)

crusoe kubernetes clusters create \
  --name my-slurm-cluster \
  --location us-east1-a \
  --add-ons crusoe_managed_slurm,nvidia_gpu_operator,nvidia_network_operator,crusoe_csi,autoclusters

Including crusoe_managed_slurm in --add-ons installs CSO automatically — you do not need to install it separately via Helm. CSO in turn installs cert-manager, the Slinky operator, Topograph, and the Crusoe Load Balancer Controller as it reconciles your SlurmCluster CR in Step 3.

Once the cluster is RUNNING, configure kubectl:

crusoe kubernetes clusters get-credentials my-slurm-cluster
kubectl cluster-info

Step 2: Create the `slurm` Namespace

All Slurm Custom Resources live in a single namespace. Only one SlurmCluster CR is supported per namespace.

kubectl create namespace slurm

Step 3: Apply the `SlurmCluster` Custom Resource

Applying a SlurmCluster CR triggers CSO to install the full Slurm stack — cert-manager, the Slinky operator, Topograph, the Crusoe Load Balancer Controller, the Slurm controller, login pods, and the shared /home volume. The controller node pool (3 c1a.4x nodes), login node pool (configurable in CRD below), and shared /home volume (configurable in CRD below) are provisioned automatically as part of this step.

Create a file named slurm-cluster.yaml:

apiVersion: slurm.crusoe.ai/v1alpha1
kind: SlurmCluster
metadata:
  name: my-slurm-cluster
  namespace: slurm
spec:
  clusterVersion: "25.11.2-cmk.12"
  loginSet:
    replicas: 2
    instanceType: c1a.8x
    # Optional. CIDR ranges allowed to reach the login LoadBalancer.
    # Defaults to 0.0.0.0/0 if omitted.
    firewallRuleSourceRanges:
      - 0.0.0.0/0
  rootSSHPubKeys:
    - "ssh-ed25519 AAAA... user@host"
  # Shared /home volume. Default 10Ti. Can be increased later but never decreased.
  homeVolumeSize: "10Ti"

Apply it:

kubectl apply -f slurm-cluster.yaml

Spec Reference

Field	Required	Default	Notes
`clusterVersion`	Yes	—	Slurm version to install (e.g. `25.05`)
`loginSet.replicas`	Yes	—	Number of login pods
`loginSet.instanceType`	No	`c1a.8x`	Login node instance type. Immutable after creation.
`loginSet.firewallRuleSourceRanges`	No	`0.0.0.0/0`	CIDR ranges allowed to reach the login LoadBalancer service
`rootSSHPubKeys`	Yes	—	SSH public keys authorized for the `root` user on login and worker nodes
`homeVolumeSize`	No	`10Ti`	Shared `/home` volume size. Format `<n>Ti` (e.g. `10Ti`, `50Ti`). Can be increased after creation but never decreased.

Watching the Reconciliation

Cluster bring-up takes ~10 minutes once the node pools are ready. Watch the phases:

kubectl get slurmclusters -n slurm -w

The STATUS column moves through Provisioning → Installing → Ready. For detail on which component is currently being configured:

kubectl describe slurmcluster my-slurm-cluster -n slurm

The conditions list (CertManagerReady, SlinkyReady, TopographReady, ControllerReady, LoginReady, etc.) indicates progress.

Once the cluster shows Ready, the LOGIN column on kubectl get slurmclusters displays 2/2 and ENDPOINT shows the external IP of the login LoadBalancer.

At this point you have a healthy Slurm control plane with login nodes, but no compute capacity yet. Add compute node pools next.

Step 4: Add Compute Node Pools

Compute pools provide the GPU capacity that runs your jobs. Unlike controller and login pools, compute pools are not auto-created by CSO — you create them yourself so you can choose the GPU type, count, and InfiniBand configuration. You can create multiple pools (for example, one per GPU type) and the operator will create a Slinky NodeSet for each one automatically. CPU node pools can also be created, but do not need the --ib-partition-id or --ephemeral-storage-for-containerd flags set.

Each compute pool needs the slurm.crusoe.ai/compute-node-type=true label and an InfiniBand partition ID:

crusoe kubernetes nodepools create \
  --name slurm-h100-workers \
  --cluster-name my-slurm-cluster \
  --type h100-80gb-sxm-ib.8x \
  --count 4 \
  --ib-partition-id <your-ib-partition-id> \
  --ephemeral-storage-for-containerd true \
  --node-labels "slurm.crusoe.ai/compute-node-type=true"

CMK automatically applies crusoe.ai/nodepool.id and crusoe.ai/nodepool.name labels to every node based on the parent node pool. The Slurm operator uses the nodepool.id label to group compute nodes into NodeSets — one NodeSet per pool.

tip

You can add or remove compute pools at any time. The operator detects new pools via the node labels and creates additional NodeSets without any CR changes on your part.

Verify Compute Nodes Were Discovered

The operator watches for compute nodes and creates a Slinky NodeSet for each unique crusoe.ai/nodepool.id:

kubectl get nodesets -n slurm

You should see one entry per compute node pool, with READY showing <n>/<n> once Slurm has registered all the nodes.

You can also verify the labels on your compute nodes directly:

kubectl get nodes -L slurm.crusoe.ai/compute-node-type,crusoe.ai/nodepool.id

Each compute node should show true for compute-node-type and a non-empty nodepool.id.

Step 5: Add Users (Optional)

To allow non-root users to SSH in and submit jobs, apply SlurmUser and SlurmUserGroup CRs. The full reference, including UID/GID assignment and POSIX naming rules, is in User Management. Brief example:

apiVersion: slurm.crusoe.ai/v1alpha1
kind: SlurmUser
metadata:
  name: alice
  namespace: slurm
spec:
  clusterReference: my-slurm-cluster
  fullName: "Alice Johnson"
  sshPublicKeys:
    - "ssh-ed25519 AAAA... alice@laptop"

kubectl apply -f user-alice.yaml

The user can SSH in within ~1 minute:

ssh -i <path-to-private-key> alice@<login-node-endpoint>

You can find the login node endpoint with:

kubectl get slurmcluster my-slurm-cluster -n slurm -o jsonpath='{.status.loginEndpoint}'

Step 6: Run a Job

Once SSH'd into a login node, standard Slurm commands work as usual:

sinfo                              # Show available nodes
srun --gpus=8 nvidia-smi            # Quick interactive GPU test
sbatch my-job.batch                 # Submit a batch job
squeue                              # Check the queue

For a multi-node NCCL test and other examples, see Advanced: Kubernetes Operations — Running NCCL Tests.

Adding More Compute Pools

To add capacity, simply create another node pool with the compute label:

crusoe kubernetes nodepools create \
  --name slurm-a100-workers \
  --cluster-name my-slurm-cluster \
  --type a100-80gb-sxm-ib.8x \
  --count 4 \
  --ib-partition-id <your-ib-partition-id> \
  --node-labels "slurm.crusoe.ai/compute-node-type=true"

Within a few seconds, a new Slinky NodeSet appears in the slurm namespace and the new nodes register with slurmctld. Run sinfo from a login node to confirm.

Updating the Cluster

Most spec fields on SlurmCluster are immutable after creation. The supported updates are:

homeVolumeSize — increase only (PVC limitation)
loginSet.replicas — scale up or down within the available login node count
rootSSHPubKeys — propagates to running login and worker pods within ~1 minute
loginSet.firewallRuleSourceRanges — updates the LoadBalancer firewall

Apply changes with kubectl apply -f slurm-cluster.yaml or kubectl edit slurmcluster my-slurm-cluster -n slurm.

Deleting the Cluster

Delete the SlurmCluster CR. The operator will tear down all components and clean up the LoadBalancer:

kubectl delete slurmcluster my-slurm-cluster -n slurm

danger

Deleting the SlurmCluster does not delete the shared /home volume. It must be manually deleted via the Crusoe Console or CLI.

When the SlurmCluster is deleted, CSO automatically removes the controller and login node pools it created. You then need to delete the compute node pools (or repurpose them for other workloads) and finally the CMK cluster itself:

crusoe kubernetes nodepools delete slurm-h100-workers --cluster-name my-slurm-cluster
crusoe kubernetes clusters delete my-slurm-cluster

Troubleshooting

Cluster stuck in `Provisioning` or `Installing`

Look at the conditions:

kubectl describe slurmcluster my-slurm-cluster -n slurm

Common causes:

LoginReady / ControllerReady stuck: CSO provisions these node pools automatically when you apply the SlurmCluster CR. If they're stuck, check that the underlying VM provisioning succeeded — crusoe kubernetes nodepools list --cluster-name my-slurm-cluster should show CSO-managed controller and login pools in RUNNING state. Quota or capacity issues at the project level are the most common reason.
SlinkyReady / TopographReady failing: the operator's Helm install is failing. Check the operator pod logs: kubectl logs -n slurm deploy/crusoe-slurm-operator.
CertManagerReady failing: cert-manager pods aren't healthy. Check kubectl get pods -n cert-manager.

Compute nodes don't appear in `sinfo`

Verify the labels on your compute nodes:

kubectl get nodes -l slurm.crusoe.ai/compute-node-type=true \
  -L crusoe.ai/nodepool.id,crusoe.ai/nodepool.name

Each compute node must have:

slurm.crusoe.ai/compute-node-type=true (set by you on the node pool)
crusoe.ai/nodepool.id=<id> (auto-set by CMK)

If labels are missing, the operator's NodeReconciler will not create a NodeSet for them. Re-create the node pool with the correct --node-labels argument.

For a deeper diagnostic dump (recommended when escalating to support), run the slurm-debug tool — see the Advanced: Kubernetes Operations guide.

Next Steps

User Management — Add users and groups, manage sudo access
Managing Partitions — Create custom Slurm partitions
Slurm Metrics — Monitor cluster and job performance
Advanced: Kubernetes Operations — Direct CRD reference, prolog/epilog, AutoClusters / SIGTERM handling

Set Up Slurm on Kubernetes

How It Works​

Prerequisites​

Step 1: Create the CMK Cluster​

Step 2: Create the slurm Namespace​

Step 3: Apply the SlurmCluster Custom Resource​

Spec Reference​

Watching the Reconciliation​

Step 4: Add Compute Node Pools​

Verify Compute Nodes Were Discovered​

Step 5: Add Users (Optional)​

Step 6: Run a Job​

Adding More Compute Pools​

Updating the Cluster​

Deleting the Cluster​

Troubleshooting​

Cluster stuck in Provisioning or Installing​

Compute nodes don't appear in sinfo​

Next Steps​