Skip to main content

Crusoe Managed Slurm on CMK

Crusoe Managed Slurm enables high-performance computing workload orchestration on Crusoe Cloud infrastructure. By deploying Slurm on top of Crusoe Managed Kubernetes (CMK), you can leverage Slurm's powerful job scheduling capabilities while benefiting from Kubernetes' container orchestration and Crusoe's GPU-optimized infrastructure.

This guide walks you through the process of setting up a Crusoe Managed Slurm cluster.

note

Crusoe Managed Slurm is currently available by request. Please reach out to Crusoe Cloud Support to learn more

Prerequisites

Before you begin, ensure you have:

  • Access to the Crusoe CLI
  • Appropriate permissions to create CMK clusters

Supported GPU Types

  • 8x NVIDIA B200 180GB (b200-180gb-sxm-ib.8x)
  • 8x NVIDIA H200 141GB (h200-141gb-sxm-ib.8x)
  • 8x NVIDIA H100 80GB (h100-80gb-sxm-ib.8x)
  • 8x NVIDIA A100 80GB (a100-80gb-sxm-ib.8x)

Support for additional GPU types is coming soon.

Creating a Slurm-Enabled Cluster

Step 1: Create the CMK Cluster with Required Add-ons

Create a new CMK cluster with the Crusoe Slurm Operator and its dependencies using the following command:

crusoe kubernetes clusters create \
--name <name> \
--cluster-version <cluster-version> \
--location <location> \
--add-ons crusoe_csi,nvidia_gpu_operator,nvidia_network_operator,crusoe_managed_slurm

Required Add-ons

The following add-ons must be included for Managed Slurm to function properly:

Add-onDescription
crusoe_managed_slurmThe Slurm operator for Kubernetes
crusoe_csiCrusoe Container Storage Interface
nvidia_gpu_operatorNVIDIA GPU support
nvidia_network_operatorNVIDIA networking capabilities
note

The crusoe_managed_slurm add-on is only available on CMK versions above 1.33.4-cmk.26.

Step 2: Create Node Pools

Once your CMK cluster is running, add the required node pools for your Slurm deployment.

Create a Control Plane Node Pool

Create a node pool for the Slurm control plane with the appropriate node labels. It is recommended to use a slice type of c1a.4x or larger:

crusoe kubernetes nodepools create \
--name slurm-control \
--count 2 \
--cluster-name <cluster-name> \
--type c1a.4x \
--node-labels 'slurm.crusoe.ai/controller-node-type=true,slurm.crusoe.ai/login-node-type=true'

Create Worker Node Pools

Create node pools for Slurm workers with your desired instance type and count:

crusoe kubernetes nodepools create \
--name slurm-workers \
--count 2 \
--cluster-name <cluster-name> \
--type <desired-instance-type> \
--node-labels 'slurm.crusoe.ai/compute-node-type=true'
tip

Note the node pool ID from the command output, as you'll need it in Step 4.

Step 3: Configure Cluster Access

Once your CMK cluster is provisioned, configure your local kubectl to interact with the cluster:

crusoe kubernetes clusters get-credentials <cluster-name>

This command retrieves your cluster's kubeconfig and configures your local kubectl context. Verify the connection:

kubectl cluster-info

Step 4: Deploy the Slurm Cluster

Deploy your Slurm cluster by applying two Kubernetes custom resources: the SlurmCluster configuration and the SlurmNodeSet configuration.

Create the Slurm Namespace

Create a file named slurm-namespace.yaml with the following content:

apiVersion: v1
kind: Namespace
metadata:
name: slurm

Apply the configuration to create the slurm namespace:

kubectl apply -f slurm-namespace.yaml

Configure the Slurm Cluster

Create a file named slurm-cluster.yaml with the following content. Replace the placeholder SSH key in spec.loginSet.rootSSHPublicKey with your public SSH key to enable access to the login nodes:

apiVersion: slurm.crusoe.ai/v1alpha1
kind: SlurmCluster
metadata:
generateName: slurm-cluster-
namespace: slurm
spec:
containerRegistry: "ghcr.io/crusoecloud/cmk/slurm-containers"
clusterVersion: "25.11.2-cmk0.0.2"

# Controller configuration
controller:
nodeSelector:
slurm.crusoe.ai/controller-node-type: "true"

# Login node configuration
loginSet:
replicas: 2 # Recommended to have at least 2 login replicas
nodeSelector:
slurm.crusoe.ai/login-node-type: "true"
rootSSHPublicKey: |
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC... # Replace with your public SSH key

# Shared storage for user home directories
userHomeVolumeClaimTemplate:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Ti

Create the slurm cluster:

kubectl create -f slurm-cluster.yaml --save-config=true

This deploys the Slurm controller and login pods. The Crusoe Managed Slurm operator automatically installs and manages the following dependencies:

  • cert-manager - Certificate management for Kubernetes
  • Crusoe Load Balancer Controller - Load balancing for cluster services
  • Slinky - Slurm integration components
  • Topograph - Topology management

Configure the Compute Nodes

Create a file named slurm-node-set.yaml with the following content. Replace <slurm-cluster-generated-name> with the name generated when you created the SlurmCluster, and <worker-nodepool-id> with the node pool ID from Step 2:

apiVersion: slurm.crusoe.ai/v1alpha1
kind: SlurmNodeSet
metadata:
name: slurm-worker-node-set
namespace: slurm
spec:
clusterReference: <slurm-cluster-generated-name>
count: 2
nodePoolID: "<worker-nodepool-id>"
nodeSelector:
slurm.crusoe.ai/compute-node-type: "true"

Apply the configuration:

kubectl apply -f slurm-node-set.yaml

This registers your worker nodes with the Slurm cluster and configures them. The nodes will be ready for job scheduling once the status.readyReplicas field of your SlurmNodeSet equals spec.count.

Step 5: Access the Cluster

Retrieve the SSH command to access your login node:

kubectl get services -n slurm -o json | jq -r '.items[] | select(.metadata.name | contains("login")) | "ssh root@\(.status.loadBalancer.ingress[0].ip)"'

This command outputs an SSH connection string. Use it to access your login pods:

ssh root@<external-ip>
tip

If you are prompted for a password, you may need to add -i path/to/private-key to specify your SSH key location.

You should now have access to your Slurm cluster and can begin submitting jobs.

Storage Configuration

Slurm requires a shared filesystem to share data across login and compute nodes. The storage class is automatically configured when you create the cluster with the crusoe_csi add-on enabled. This is what backs the /home directory on your login and worker pods.

The managed service creates a crusoe-csi-driver-fs-sc StorageClass that supports:

  • Volume binding mode: WaitForFirstConsumer
  • Volume expansion: Enabled
  • Provisioner: fs.csi.crusoe.ai

You can adjust storage capacity by modifying the userHomeVolumeClaimTemplate.resources.requests.storage value in the SlurmCluster custom resource and reapplying the configuration. If you are editing the slurm-cluster.yaml file directly, you may need to add name: <generatedName> to the metadata. To see these changes, run kubectl get pvc -A.

Using Your Slurm Cluster

Once connected to the login node, you can submit and manage jobs using standard Slurm commands:

CommandDescription
sinfoView cluster status and node information
squeueView the job queue
sbatchSubmit a batch job
srunRun a job interactively
scancelCancel a job

For GPU jobs, specify GPU requirements using the --gpus flag:

srun --gpus=1 nvidia-smi

Monitoring and Troubleshooting

Checking Cluster Status

Use the following commands to verify your cluster is healthy:

sinfo                   # Check node states
scontrol show node # View detailed node information
scontrol show config # View Slurm configuration

Common Issues

IssueResolution
Nodes in drain stateCheck node reasons with sinfo -R to identify configuration issues
GPU not detectedVerify the GPU operator is running and nodes have the correct labels
Job allocation failuresCheck available resources with sinfo and verify job requirements are within cluster capacity

Running NCCL Tests

To run NCCL tests, SSH into the login pod and switch to the /home directory. Create the following script named nccl_test.batch:

#!/bin/bash

#SBATCH --job-name=nccl_tests
#SBATCH --nodes=<number of nodes>
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=8
#SBATCH --time=20:00
#SBATCH --output="%x_%j.out"
#SBATCH --exclusive

export NCCL_TOPO_FILE=/etc/crusoe/nccl_topo/h200-141gb-sxm-ib-cloud-hypervisor.xml
export NCCL_SOCKET_IFNAME=eth0
export NCCL_IB_HCA="mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1"
export NCCL_IB_MERGE_VFS=0
export NCCL_DEBUG=WARN

export OMPI_MCA_coll_hcoll_enable=0
export PMIX_MCA_gds='^ds12'

export UCX_NET_DEVICES="mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1"

srun --mpi=pmix /opt/nccl-tests/build/all_reduce_perf -b 2G -e 32G -f 2

Submit the test:

sbatch nccl_test.batch

From the login pod, use squeue to see the tasks being run. Once there are no more tasks in the queue, check the output file nccl_tests_<job_id>.out in the /home directory to view the results.

To interact directly with a worker pod:

srun --pty bash                              # Connect to any worker pod
srun --nodelist=<worker-pod-name> --pty bash # Connect to a specific worker pod

Deleting a Slurm Cluster

To delete the Slurm cluster, run the following commands:

kubectl delete namespace slurm
kubectl delete namespace slinky

This deletes the slurm and slinky namespaces and all the resources associated with them.

Next Steps

For detailed information on Slurm usage and advanced configuration options, refer to the official Slurm documentation.

Support

If you encounter issues during setup or need assistance with your Managed Slurm deployment, please contact Crusoe Support.