Skip to main content

InfiniBand Metrics

info

InfiniBand metrics are currently available in following regions, under preview:

  • us-southcentral1-a
  • us-east1-a

If you require access, please contact our sales team to request access.

Crusoe Cloud provides InfiniBand (IB) metrics to help you monitor IB performance, identify failure, and optimize IB utilization as needed. IB metrics are collected and published every 3 minutes. You can access the following IB metrics in Crusoe Console, under VM metrics view. You can also access these metrics via Crusoe’s Prometheus-compatible query API. See the instructions on VM metrics page on accessing via API or Grafana.

MetricsDefinitionSuggested Query
Line RateThe maximum theoretical data transfer speed of the InfiniBand link, measured in Gigabits per second (Gb/s).crusoe_ib_port_line_rate
InfiniBand Throughput Tx (GiB/s)The rate of data transmitted from the port, measured in gibibytes (GiB) per second.rate(crusoe_ib_port_throughput_tx[5m])
InfiniBand Throughput Rx (GiB/s)The rate of data received by the port, measured in gibibytes (GiB) per second.rate(crusoe_ib_port_throughput_rx[5m])
InfiniBand Throughput Tx (Packets per second)The rate of packets transmitted from the port, measured in packets per second.rate(crusoe_ib_port_packets_tx[5m])
InfiniBand Throughput Rx (Packets per second)The rate of packets received by the port, measured in packets per second.rate(crusoe_ib_port_packets_rx[5m])
InfiniBand Transmit WaitThe rate at which packets had to wait before being transmitted from the port, indicating a congestion or scheduling issue.rate(crusoe_ib_port_tx_wait[5m])
InfiniBand Link DownedThe rate of the InfiniBand link transitioned from an active state to a link-down state.rate(crusoe_ib_port_link_downed[5m])
InfiniBand Link Error RecoveryThe rate of the link underwent an error recovery process to attempt to restore a healthy link state.rate(crusoe_ib_port_link_error_recovery[5m])
InfiniBand Port Constraint Errors TxThe rate of errors transmitted due to link protocol or connectivity constraints, measured in packets per second.rate(crusoe_ib_port_constraint_error_tx[5m])
InfiniBand Port Constraint Errors RxThe rate of errors received due to link protocol or connectivity constraints, measured in packets per second.rate(crusoe_ib_port_constraint_error_rx[5m])
InfiniBand Port Errors RxThe rate of received packets with errors at the port level, typically indicating issues like bad Cyclic Redundancy Check (CRC) errors.rate(crusoe_ib_port_error_rx[5m])
InfiniBand Symbol ErrorsThe rate of low-level physical error bits received where the receiver detected an invalid data symbol, measured in bits per second.rate(crusoe_ib_port_bits_error[5m])