Skip to main content

CMK Add-ons

Overview

We support a growing set of add-ons and plugins that extend the functionality of CMK (and Kubernetes in general). These add-ons either help integrate CMK with native aspects of Crusoe Cloud, like storage or node lifecycle activities or add features relevant to AI workloads.

We support two methods of add-on installation:

  1. When provisioning a cluster, you can opt-in to installing a number of these add-ons
  2. At any point post provisioning a cluster, each add-on may be installed via Helm.

Most add-ons that interface with Crusoe Cloud APIs (like the Cloud Controller Manager and Container Storage Interface) require an API access / secret key to be present on the cluster and named CRUSOE_ACCESS_KEY and CRUSOE_SECRET_KEY. By default, we create a managed secret for you during cluster provisioning titled cmk-{clusterName}, that is compatible with all Crusoe-vended add-ons.

Supported Add-ons

AddonDetailsLinks
Container Storage Interface (CSI)The Crusoe CSI allows workloads within the cluster to create and manage supported Crusoe disk types as PersistentVolumes. This currently includes our Persistent Disks and Shared Disks products.Github
Nvidia GPU OperatorDiscovers and exposes Nvidia GPUs as allocatable resources associated with nodes. Currently supported as an opt-in add-on when provisioning a cluster, and may be installed by following the default configuration instructions in the documentation.Nvidia Docs
Nvidia Network OperatorDiscovers and exposes Nvidia Host Channel Adapters (HCAs) as allocatable resources within nodes. Currently supported as an opt-in add-on when provisioning a cluster, and is applicable when attaching InfiniBand enabled instances as nodes to your cluster. You may also install the Network Operator by following the Vanilla Kubernetes installation instructions in the documentation.Nvidia docs
Cluster AutoscalerDeploys a CMK-compatible cluster autoscaler into your cluster that will automatically scale your node pools in and out based on the number of pending pods in your cluster. Please note that autoscaler does not set default min-max limits for node pools. Instructions to configure these limits both during provisioning, and when new node pools are added to the cluster, are available at the linked Github repository.Github
AutoClustersEnhances workload resilience by automatically detecting and remediating node-level hardware failures. AutoClusters gracefully terminates affected pods, restarts or replaces the unhealthy node, and reschedules pods on the healthy replacement, minimizing downtime. See the documentation to learn how to enable AutoClusters for your workloads.AutoClusters Docs

CSI NFS Support

Starting with CSI driver version v0.10.0, the Crusoe CSI driver supports NFS (Network File System) mounting for shared volumes, powered by VAST NFS. When NFS support is enabled for your Crusoe project, shared volumes will automatically be mounted using NFS instead of the default virtiofs protocol. This provides improved performance and reliability for shared storage workloads.

Prerequisites

  • Your project must be NFS-enabled (contact Crusoe support to enable)
  • Cluster must be on CSI Helm chart v0.10.3 or newer
  • Worker nodes should use NFS-baked worker images (recommended for faster initialization)

For optimal NFS support, use the following image versions or later:

Control Plane Images (includes CSI v0.10.3+)

Kubernetes VersionImage
1.331.33.4-cmk.15
1.321.32.7-cmk.18
1.311.31.7-cmk.21

Worker Images (NFS pre-installed)

Kubernetes VersionImage
1.331.33.4-cmk.4
1.321.32.7-cmk.16
1.311.31.7-cmk.9
1.301.30.8-cmk.16

GB200 Worker Images (NFS pre-installed)

Kubernetes VersionImage
1.331.33.4-cmk.5-gb200

Instance Type Compatibility

Shared filesystem volumes with NFS support are available on all instance types:

Instance FamilySupported Types
c1aAll types
s1aAll types
GPU instancesAll types

Migrating from VirtioFS to NFS

Installing CSI Helm chart v0.10.0+ and enabling NFS for your project does not automatically convert existing VirtioFS-mounted shared disks to NFS. Existing mounts continue using VirtioFS until they are detached and re-attached.

Once the upgraded CSI Helm chart is installed and NFS is enabled:

  • New shared disks will mount via NFS
  • New attachments of existing shared disks will mount via NFS
  • Existing mounts remain on VirtioFS until re-attached
note

On worker nodes without pre-installed NFS packages, the initial NFS package download takes approximately 6 minutes. Using the recommended NFS-baked worker images reduces this to negligible time.

Migration Approaches

In this approach, you intentionally trigger the CSI driver to detach and re-attach shared disks to re-mount them via NFS.

Important: The CSI driver uses reference counting to determine when it's safe to unmount/detach a disk. If any pods on a worker node are using a shared disk, the CSI driver will not unmount or detach that disk.

Option 1: Remove pods to trigger re-attachment

  1. Remove all pods using a specific shared disk from a node using one of these methods:

    • Pod affinity/anti-affinity rules
    • topologySpreadConstraints
    • Node draining:
      kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
  2. Once all pods are removed, the CSI driver will unmount and detach the disk

  3. When pods are rescheduled, the disk will re-attach using NFS

Option 2: Replace node pools

Create new node pools with NFS-baked worker images and delete the old node pools. This quickly triggers new NFS attachments for all shared disks:

  1. Create a new node pool using recommended worker images
  2. Cordon old nodes:
    kubectl cordon <old-node-name>
  3. Drain workloads to new nodes:
    kubectl drain <old-node-name> --ignore-daemonsets --delete-emptydir-data
  4. Delete the old node pool once migration is complete

Passive Migration

This approach relies on natural churn within your cluster:

  • If using the Cluster Autoscaler add-on with node pools that frequently scale up and down, nodes will naturally cycle out over time
  • As nodes are replaced, new attachments will use NFS

Considerations:

  • Some long-lived nodes may never cycle out naturally
  • You can check node lifetimes with:
    kubectl get nodes
  • For nodes with long lifetimes, consider manually terminating the underlying Crusoe VM to force a replacement

Important Notes

  • Existing volumes managed by the CSI driver persist through upgrades
  • Already-mounted volumes remain unaffected during the driver update
  • If upgrading from a version prior to v0.7.0, you must follow the v0.7.0 upgrade instructions first
tip

For assistance with NFS migration or to enable NFS support for your project, contact Crusoe support.