Crusoe Managed Slurm on CMK
Crusoe Managed Slurm enables high-performance computing workload orchestration on Crusoe Cloud infrastructure. By deploying Slurm on top of Crusoe Managed Kubernetes (CMK), you can leverage Slurm's powerful job scheduling capabilities while benefiting from Kubernetes' container orchestration and Crusoe's GPU-optimized infrastructure.
This guide walks you through the process of setting up a Crusoe Managed Slurm cluster.
Crusoe Managed Slurm is currently available by request. Please reach out to Crusoe Cloud Support to learn more
Prerequisites
Before you begin, ensure you have:
- Access to the Crusoe CLI
- Appropriate permissions to create CMK clusters
Supported GPU Types
- 8x NVIDIA B200 180GB (
b200-180gb-sxm-ib.8x) - 8x NVIDIA H200 141GB (
h200-141gb-sxm-ib.8x) - 8x NVIDIA H100 80GB (
h100-80gb-sxm-ib.8x) - 8x NVIDIA A100 80GB (
a100-80gb-sxm-ib.8x)
Support for additional GPU types is coming soon.
Creating a Slurm-Enabled Cluster
Step 1: Create the CMK Cluster with Required Add-ons
Create a new CMK cluster with the Crusoe Slurm Operator and its dependencies using the following command:
crusoe kubernetes clusters create \
--name <name> \
--cluster-version <cluster-version> \
--location <location> \
--add-ons crusoe_csi,nvidia_gpu_operator,nvidia_network_operator,crusoe_managed_slurm
Required Add-ons
The following add-ons must be included for Managed Slurm to function properly:
| Add-on | Description |
|---|---|
crusoe_managed_slurm | The Slurm operator for Kubernetes |
crusoe_csi | Crusoe Container Storage Interface |
nvidia_gpu_operator | NVIDIA GPU support |
nvidia_network_operator | NVIDIA networking capabilities |
The crusoe_managed_slurm add-on is only available on CMK versions above 1.33.4-cmk.26.
Step 2: Create Node Pools
Once your CMK cluster is running, add the required node pools for your Slurm deployment.
Create a Control Plane Node Pool
Create a node pool for the Slurm control plane with the appropriate node labels. It is recommended to use a slice type of c1a.4x or larger:
crusoe kubernetes nodepools create \
--name slurm-control \
--count 2 \
--cluster-name <cluster-name> \
--type c1a.4x \
--node-labels 'slurm.crusoe.ai/controller-node-type=true,slurm.crusoe.ai/login-node-type=true'
Create Worker Node Pools
Create node pools for Slurm workers with your desired instance type and count:
crusoe kubernetes nodepools create \
--name slurm-workers \
--count 2 \
--cluster-name <cluster-name> \
--type <desired-instance-type> \
--node-labels 'slurm.crusoe.ai/compute-node-type=true'
Note the node pool ID from the command output, as you'll need it in Step 4.
Step 3: Configure Cluster Access
Once your CMK cluster is provisioned, configure your local kubectl to interact with the cluster:
crusoe kubernetes clusters get-credentials <cluster-name>
This command retrieves your cluster's kubeconfig and configures your local kubectl context. Verify the connection:
kubectl cluster-info
Step 4: Deploy the Slurm Cluster
Deploy your Slurm cluster by applying two Kubernetes custom resources: the SlurmCluster configuration and the SlurmNodeSet configuration.
Create the Slurm Namespace
Create a file named slurm-namespace.yaml with the following content:
apiVersion: v1
kind: Namespace
metadata:
name: slurm
Apply the configuration to create the slurm namespace:
kubectl apply -f slurm-namespace.yaml
Configure the Slurm Cluster
Create a file named slurm-cluster.yaml with the following content. Replace the placeholder SSH key in spec.loginSet.rootSSHPublicKey with your public SSH key to enable access to the login nodes:
apiVersion: slurm.crusoe.ai/v1alpha1
kind: SlurmCluster
metadata:
generateName: slurm-cluster-
namespace: slurm
spec:
containerRegistry: "ghcr.io/crusoecloud/cmk/slurm-containers"
clusterVersion: "25.11.2-cmk0.0.2"
# Controller configuration
controller:
nodeSelector:
slurm.crusoe.ai/controller-node-type: "true"
# Login node configuration
loginSet:
replicas: 2 # Recommended to have at least 2 login replicas
nodeSelector:
slurm.crusoe.ai/login-node-type: "true"
rootSSHPublicKey: |
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC... # Replace with your public SSH key
# Shared storage for user home directories
userHomeVolumeClaimTemplate:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Ti
Create the slurm cluster:
kubectl create -f slurm-cluster.yaml --save-config=true
This deploys the Slurm controller and login pods. The Crusoe Managed Slurm operator automatically installs and manages the following dependencies:
- cert-manager - Certificate management for Kubernetes
- Crusoe Load Balancer Controller - Load balancing for cluster services
- Slinky - Slurm integration components
- Topograph - Topology management
Configure the Compute Nodes
Create a file named slurm-node-set.yaml with the following content. Replace <slurm-cluster-generated-name> with the name generated when you created the SlurmCluster, and <worker-nodepool-id> with the node pool ID from Step 2:
apiVersion: slurm.crusoe.ai/v1alpha1
kind: SlurmNodeSet
metadata:
name: slurm-worker-node-set
namespace: slurm
spec:
clusterReference: <slurm-cluster-generated-name>
count: 2
nodePoolID: "<worker-nodepool-id>"
nodeSelector:
slurm.crusoe.ai/compute-node-type: "true"
Apply the configuration:
kubectl apply -f slurm-node-set.yaml
This registers your worker nodes with the Slurm cluster and configures them. The nodes will be ready for job scheduling once the status.readyReplicas field of your SlurmNodeSet equals spec.count.
Step 5: Access the Cluster
Retrieve the SSH command to access your login node:
kubectl get services -n slurm -o json | jq -r '.items[] | select(.metadata.name | contains("login")) | "ssh root@\(.status.loadBalancer.ingress[0].ip)"'
This command outputs an SSH connection string. Use it to access your login pods:
ssh root@<external-ip>
If you are prompted for a password, you may need to add -i path/to/private-key to specify your SSH key location.
You should now have access to your Slurm cluster and can begin submitting jobs.
Storage Configuration
Slurm requires a shared filesystem to share data across login and compute nodes. The storage class is automatically configured when you create the cluster with the crusoe_csi add-on enabled. This is what backs the /home directory on your login and worker pods.
The managed service creates a crusoe-csi-driver-fs-sc StorageClass that supports:
- Volume binding mode: WaitForFirstConsumer
- Volume expansion: Enabled
- Provisioner: fs.csi.crusoe.ai
You can adjust storage capacity by modifying the userHomeVolumeClaimTemplate.resources.requests.storage value in the SlurmCluster custom resource and reapplying the configuration. If you are editing the slurm-cluster.yaml file directly, you may need to add name: <generatedName> to the metadata. To see these changes, run kubectl get pvc -A.
Using Your Slurm Cluster
Once connected to the login node, you can submit and manage jobs using standard Slurm commands:
| Command | Description |
|---|---|
sinfo | View cluster status and node information |
squeue | View the job queue |
sbatch | Submit a batch job |
srun | Run a job interactively |
scancel | Cancel a job |
For GPU jobs, specify GPU requirements using the --gpus flag:
srun --gpus=1 nvidia-smi
Monitoring and Troubleshooting
Checking Cluster Status
Use the following commands to verify your cluster is healthy:
sinfo # Check node states
scontrol show node # View detailed node information
scontrol show config # View Slurm configuration
Common Issues
| Issue | Resolution |
|---|---|
| Nodes in drain state | Check node reasons with sinfo -R to identify configuration issues |
| GPU not detected | Verify the GPU operator is running and nodes have the correct labels |
| Job allocation failures | Check available resources with sinfo and verify job requirements are within cluster capacity |
Running NCCL Tests
To run NCCL tests, SSH into the login pod and switch to the /home directory. Create the following script named nccl_test.batch:
#!/bin/bash
#SBATCH --job-name=nccl_tests
#SBATCH --nodes=<number of nodes>
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=8
#SBATCH --time=20:00
#SBATCH --output="%x_%j.out"
#SBATCH --exclusive
export NCCL_TOPO_FILE=/etc/crusoe/nccl_topo/h200-141gb-sxm-ib-cloud-hypervisor.xml
export NCCL_SOCKET_IFNAME=eth0
export NCCL_IB_HCA="mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1"
export NCCL_IB_MERGE_VFS=0
export NCCL_DEBUG=WARN
export OMPI_MCA_coll_hcoll_enable=0
export PMIX_MCA_gds='^ds12'
export UCX_NET_DEVICES="mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1,mlx5_8:1"
srun --mpi=pmix /opt/nccl-tests/build/all_reduce_perf -b 2G -e 32G -f 2
Submit the test:
sbatch nccl_test.batch
From the login pod, use squeue to see the tasks being run. Once there are no more tasks in the queue, check the output file nccl_tests_<job_id>.out in the /home directory to view the results.
To interact directly with a worker pod:
srun --pty bash # Connect to any worker pod
srun --nodelist=<worker-pod-name> --pty bash # Connect to a specific worker pod
Deleting a Slurm Cluster
To delete the Slurm cluster, run the following commands:
kubectl delete namespace slurm
kubectl delete namespace slinky
This deletes the slurm and slinky namespaces and all the resources associated with them.
Next Steps
For detailed information on Slurm usage and advanced configuration options, refer to the official Slurm documentation.
Support
If you encounter issues during setup or need assistance with your Managed Slurm deployment, please contact Crusoe Support.