Skip to main content

Overview

dstack is an open-source control plane for GPU provisioning and orchestration. It lets you define infrastructure and workloads as YAML configurations—such as GPU clusters, IDEs, training jobs, and inference services—and applies them to Crusoe Cloud with a single CLI command.

dstack natively integrates with Crusoe, including support for multi-node clusters with InfiniBand interconnect. It's container-native: every workload runs in a container on instances dstack provisions or attaches to, and doesn't require scheduler or Kubernetes expertise.

Choose a deployment mode

You can use dstack with Crusoe in one of two modes. Both give end users the same experience—the same YAML configurations and the same dstack apply workflow. Choose the path that fits how you want to manage infrastructure:

ModeBest forHow it worksWhere to start
Crusoe VMs (native backend)Teams who want dstack to provision and manage compute end to end.You provide a Crusoe API key; dstack provisions VMs through the Crusoe API and automatically creates InfiniBand partitions for cluster fleets.Quickstart: VMs
Crusoe Managed Kubernetes (CMK)Teams who already run CMK and want dstack as a workload layer on existing node pools.You provide a kubeconfig; dstack schedules workloads onto provisioned CMK nodes.Quickstart: CMK

How dstack works on Crusoe

Architecture layers

A dstack deployment has three layers:

  1. dstack server—the control plane. You run it yourself (using pip, uv, or Docker—on a laptop, a CPU VM, or anywhere else). The server stores state, schedules runs, and communicates with Crusoe.
  2. Backend—the connection between the server and Crusoe. The native crusoe backend authenticates with your Crusoe API key and provisions VMs directly; the kubernetes backend connects to a CMK cluster through a kubeconfig.
  3. Fleets and runs—users define fleets (pools of instances) and runs (development environments, tasks, services) as YAML files and submit them with dstack apply. dstack provisions capacity, queues and schedules workloads, and streams logs back to the CLI.

When a fleet sets placement: cluster on the crusoe backend, dstack automatically creates an InfiniBand partition and provisions the instances with InfiniBand networking, provided the selected instance type supports it—you don't need to perform any manual network setup.

Division of responsibilities

The following table summarizes what dstack handles automatically and what you remain responsible for:

dstack manages:

  • Instance provisioning and teardown
  • InfiniBand partition creation
  • Job queueing and scheduling
  • On-demand autoscaling (nodes: 0..N)
  • Idle-instance termination
  • Secure shell (SSH) access, port forwarding, and ingress for services

You manage:

  • The dstack server itself
  • Your Crusoe credentials and quotas
  • In CMK mode, the cluster and its node pools

Key concepts

The following terms appear throughout the dstack documentation and the rest of this page:

ConceptDescription
dstack serverThe control plane that stores state and orchestrates everything. Configured through ~/.dstack/server/config.yml.
BackendA connection to a compute provider. Use crusoe for native VM provisioning or kubernetes for CMK.
FleetA pool of instances that runs are scheduled onto. Supports fixed size (nodes: 2), on-demand ranges (nodes: 0..2), and interconnected clusters (placement: cluster).
Development environmentAn interactive run with SSH and desktop integrated development environment (IDE) access (VS Code, Cursor) for development on GPU instances.
TaskA job that runs commands to completion—single-node or distributed across the fleet for multi-node training.
ServiceA long-running workload exposed as an endpoint—for example, a vLLM or SGLang model server—with autoscaling and optional OpenAI-compatible routing.
VolumePersistent storage for runs. On Crusoe, use instance volumes (bind-mounts of host directories); dstack network volumes aren't supported on the crusoe backend.

Supported GPU types

dstack fleets request hardware through a resources spec rather than instance type names. Crusoe InfiniBand instance types map as follows:

Crusoe instance typedstack resources.gpu
a100-80gb-sxm-ib.8xA100:80GB:8
h100-80gb-sxm-ib.8xH100:80GB:8
h200-141gb-sxm-ib.8xH200:141GB:8
b200-180gb-sxm-ib.8xB200:180GB:8
note

Run dstack offer -b crusoe to list the instance types and regions currently available to your project.

Compare dstack with other orchestration options

dstack sits alongside Crusoe's managed orchestration products. Use the following table as a guide:

Orchestration optionUse case
dstack (native backend)YAML-defined, container-native development environments, training jobs, and inference services, with dstack provisioning Crusoe compute for you
Crusoe Managed SlurmTraditional high-performance computing (HPC) batch scheduling with sbatch/srun, shared /home, and multi-user Linux accounts, fully managed by Crusoe
CMKDirect Kubernetes-native control over workloads, operators, and Helm charts

These aren't mutually exclusive: dstack's kubernetes backend runs on CMK, and dstack also provides a Slurm migration guide for teams moving from scheduler-based workflows.

dstack reference resources

These pages cover the Crusoe-specific setup. For everything else, dstack's own documentation is the canonical reference:

ResourceLinks
ConceptsBackends, Fleets, Dev environments, Tasks, Services, Volumes, Gateways
Inference examplesSGLang, vLLM, TensorRT-LLM, NIM, and Dynamo for disaggregated prefill/decode serving
Training examplesTRL, Axolotl, Ray+RAGEN
Reference.dstack.yml, CLI, server/config.yml
ProjectGitHub, Discord

Next steps

  • Quickstart — Set up the dstack server, connect it to Crusoe, and run your first GPU workload
  • Clusters — Provision multi-node InfiniBand clusters and validate them with NCCL tests