Overview
dstack is an open-source control plane for GPU provisioning and orchestration. It lets you define infrastructure and workloads as YAML configurations—such as GPU clusters, IDEs, training jobs, and inference services—and applies them to Crusoe Cloud with a single CLI command.
dstack natively integrates with Crusoe, including support for multi-node clusters with InfiniBand interconnect. It's container-native: every workload runs in a container on instances dstack provisions or attaches to, and doesn't require scheduler or Kubernetes expertise.
Choose a deployment mode
You can use dstack with Crusoe in one of two modes. Both give end users the same experience—the same YAML configurations and the same dstack apply workflow. Choose the path that fits how you want to manage infrastructure:
| Mode | Best for | How it works | Where to start |
|---|---|---|---|
| Crusoe VMs (native backend) | Teams who want dstack to provision and manage compute end to end. | You provide a Crusoe API key; dstack provisions VMs through the Crusoe API and automatically creates InfiniBand partitions for cluster fleets. | Quickstart: VMs |
| Crusoe Managed Kubernetes (CMK) | Teams who already run CMK and want dstack as a workload layer on existing node pools. | You provide a kubeconfig; dstack schedules workloads onto provisioned CMK nodes. | Quickstart: CMK |
How dstack works on Crusoe
Architecture layers
A dstack deployment has three layers:
- dstack server—the control plane. You run it yourself (using
pip,uv, or Docker—on a laptop, a CPU VM, or anywhere else). The server stores state, schedules runs, and communicates with Crusoe. - Backend—the connection between the server and Crusoe. The native
crusoebackend authenticates with your Crusoe API key and provisions VMs directly; thekubernetesbackend connects to a CMK cluster through a kubeconfig. - Fleets and runs—users define fleets (pools of instances) and runs (development environments, tasks, services) as YAML files and submit them with
dstack apply. dstack provisions capacity, queues and schedules workloads, and streams logs back to the CLI.
When a fleet sets placement: cluster on the crusoe backend, dstack automatically creates an InfiniBand partition and provisions the instances with InfiniBand networking, provided the selected instance type supports it—you don't need to perform any manual network setup.
Division of responsibilities
The following table summarizes what dstack handles automatically and what you remain responsible for:
dstack manages:
- Instance provisioning and teardown
- InfiniBand partition creation
- Job queueing and scheduling
- On-demand autoscaling (
nodes: 0..N) - Idle-instance termination
- Secure shell (SSH) access, port forwarding, and ingress for services
You manage:
- The dstack server itself
- Your Crusoe credentials and quotas
- In CMK mode, the cluster and its node pools
Key concepts
The following terms appear throughout the dstack documentation and the rest of this page:
| Concept | Description |
|---|---|
| dstack server | The control plane that stores state and orchestrates everything. Configured through ~/.dstack/server/config.yml. |
| Backend | A connection to a compute provider. Use crusoe for native VM provisioning or kubernetes for CMK. |
| Fleet | A pool of instances that runs are scheduled onto. Supports fixed size (nodes: 2), on-demand ranges (nodes: 0..2), and interconnected clusters (placement: cluster). |
| Development environment | An interactive run with SSH and desktop integrated development environment (IDE) access (VS Code, Cursor) for development on GPU instances. |
| Task | A job that runs commands to completion—single-node or distributed across the fleet for multi-node training. |
| Service | A long-running workload exposed as an endpoint—for example, a vLLM or SGLang model server—with autoscaling and optional OpenAI-compatible routing. |
| Volume | Persistent storage for runs. On Crusoe, use instance volumes (bind-mounts of host directories); dstack network volumes aren't supported on the crusoe backend. |
Supported GPU types
dstack fleets request hardware through a resources spec rather than instance type names. Crusoe InfiniBand instance types map as follows:
| Crusoe instance type | dstack resources.gpu |
|---|---|
a100-80gb-sxm-ib.8x | A100:80GB:8 |
h100-80gb-sxm-ib.8x | H100:80GB:8 |
h200-141gb-sxm-ib.8x | H200:141GB:8 |
b200-180gb-sxm-ib.8x | B200:180GB:8 |
Run dstack offer -b crusoe to list the instance types and regions currently available to your project.
Compare dstack with other orchestration options
dstack sits alongside Crusoe's managed orchestration products. Use the following table as a guide:
| Orchestration option | Use case |
|---|---|
| dstack (native backend) | YAML-defined, container-native development environments, training jobs, and inference services, with dstack provisioning Crusoe compute for you |
| Crusoe Managed Slurm | Traditional high-performance computing (HPC) batch scheduling with sbatch/srun, shared /home, and multi-user Linux accounts, fully managed by Crusoe |
| CMK | Direct Kubernetes-native control over workloads, operators, and Helm charts |
These aren't mutually exclusive: dstack's kubernetes backend runs on CMK, and dstack also provides a Slurm migration guide for teams moving from scheduler-based workflows.
dstack reference resources
These pages cover the Crusoe-specific setup. For everything else, dstack's own documentation is the canonical reference:
| Resource | Links |
|---|---|
| Concepts | Backends, Fleets, Dev environments, Tasks, Services, Volumes, Gateways |
| Inference examples | SGLang, vLLM, TensorRT-LLM, NIM, and Dynamo for disaggregated prefill/decode serving |
| Training examples | TRL, Axolotl, Ray+RAGEN |
| Reference | .dstack.yml, CLI, server/config.yml |
| Project | GitHub, Discord |
Next steps
- Quickstart — Set up the dstack server, connect it to Crusoe, and run your first GPU workload
- Clusters — Provision multi-node InfiniBand clusters and validate them with NCCL tests