Skip to main content

Managing InfiniBand Networking

Overview

Crusoe Cloud supports high performance interconnects utilizing NVIDIA Mellanox InfiniBand (IB) networking. The fabric is currently supported for the instance types in the table below:

Instance TypeNumber of Infiniband HCAs per InstanceTotal InfiniBand Bandwidth (Gbps)
a100-80gb-sxm-ib.8x81600
h100-80gb-sxm-ib.8x83200
h200-141gb-sxm-ib.8x83200

The general workflow, which will be discussed in more detail below, emcompasses selecting an IB network for the list of networks available within a location, and then creating an IB parition from within that network. Finally you will launch instances into that partition. This ensures cluster tenancy within a parition to maximize performance with cluster wide isolation.

InfiniBand VM image

Crusoe Cloud provides a default VM image that comes with all the software libraries and tools necessary to take advantage of InfiniBand through supported ML and HPC frameworks. While you are not required to use this image to use InfiniBand networking, we strongly recommend the ubuntu22.04-nvidia-sxm-docker:latest image for the easiest possible setup. Learn more about images.

info

To support full-performance distributed training on Crusoe’s hypervisor/virtualisation stack, some NCCL configuration changes are required.

NCCL Configuration

Crusoe’s updated hypervisor stack changes the PCI address of devices within the virtual machines. This requires an update to the NCCL XML topology file. The XML file is included by default in Crusoe’s curated image, you can set the following environment variable in /etc/nccl.conf.

NCCL_TOPO_FILE=<PATH_TO_TOPO_FILE>

The *-nvidia-sxm-docker images also comes with a service unit file called crusoe_nccl_topo.service which sets the NCCL_TOPO_FILE environmental variable. If a custom topology file location is used (for example, to customize the topology file), the service should be disabled by running systemctl disable crusoe_nccl_topo.service. The current state of the service may be queried by running systemctl status crusoe_nccl_topo.service.

For NCCL to correctly detect the PCIe topology, the following environment variables must be set in /etc/nccl.conf.

NCCL_IB_MERGE_VFS=0

The NCCL_IB_HCA configuration must be modified to exclude the Ethernet device mlx5_0.

NCCL_IB_HCA=^mlx5_0:1

When using versions of HPC-X older than 2.18, the following argument must be used.

NCCL_IBEXT_DISABLE=1

If you are not using this image, or want to run containers, you must also set all these environment variables:

FROM ...
ENV NCCL_TOPO_FILE=/path/to/nccl_topo.xml
ENV NCCL_IB_MERGE_VFS=0
ENV NCCL_IB_HCA=^mlx5_0:1

NCCL_TOPO_FILE

Provided below are the NCCL Topology files:

Instance TypeNCCL Topology File
a100-80gb-sxm-ib.8xa100-80gb-sxm-ib.8x
h100-80gb-sxm-ib.8xh100-80gb-sxm-ib.8x
h200-141gb-sxm-ib.8xh200-80gb-sxm-ib.8x

InfiniBand Networks

InfiniBand Networks are a logical representation of the physical InfiniBand fabric.

InfiniBand Network limitations

You are limited to a maximum of five InfiniBand partitions on the same InfiniBand network.

Listing InfiniBand Networks and Partitions

Use the networking ib-networks list and networking ib-partitions list commands to list networks and partitions.

crusoe networking ib-networks list
crusoe networking ib-partitions list

The IDs will be used when attaching a VM to a partition.

Creating InfiniBand Partitions

Use the networking ib-partitions create command to create a new partition.

crusoe networking ib-partitions create \
--name my-new-partition \
--ib-network-id uuid-of-network

Update an existing InfiniBand Partition

Updating Infiniband partitions is currently unsupported in the Crusoe Cloud CLI.

Deleting an InfiniBand Partition

info

Warning: deleting an InfiniBand partition is a permanant action that will require re-creation of the partition to recover.

Infiniband partitions can be deleted in the CLI using the networking ib-partitions delete <id> command.

Launching Instances in an InfiniBand Parition

Use the compute vms create command to create a new VM, passing in the --ib-partition-id:

crusoe compute vms create \
--name infiniband-test \
--location us-east1-a \
--type a100-80gb-sxm-ib.8x \
--image ubuntu22.04-nvidia-sxm-docker:latest \
--ib-partition-id uuid-of-partition \
...

Updating the IB partition on a VM

Use the compute vms update on a stopped VM to update the --ib-partition-id.

crusoe compute vms update \
--ib-partition-id uuid-of-new-partition