Skip to main content

InfiniBand Metrics

Crusoe Cloud provides out-of-the-box InfiniBand (IB) metrics to help you monitor network performance, identify failures, and optimize utilization. These metricsare collected and published in 5-minute intervals, and are retained for 30 days. You can view IB metrics directly within the Crusoe Console under the VM Metrics view, or access them via our Prometheus-compatible query API. For detailed instructions on connecting via the API or Grafana, please refer to the VM metrics page.

MetricsDefinitionSuggested Query
Line RateThe maximum theoretical data transfer speed of the InfiniBand link, measured in Gigabits per second (Gb/s).crusoe_ib_port_line_rate
InfiniBand Throughput Tx (bytes per second)The rate of data transmitted from the port, measured in bytes per second.rate(crusoe_ib_port_throughput_tx[5m])
InfiniBand Throughput Rx (bytes per second)The rate of data received by the port, measured in bytes per second.rate(crusoe_ib_port_throughput_rx[5m])
InfiniBand Throughput Tx (Packets per second)The rate of packets transmitted from the port, measured in packets per second.rate(crusoe_ib_port_packets_tx[5m])
InfiniBand Throughput Rx (Packets per second)The rate of packets received by the port, measured in packets per second.rate(crusoe_ib_port_packets_rx[5m])
InfiniBand Transmit WaitThe rate at which packets had to wait before being transmitted from the port, indicating a congestion or scheduling issue.rate(crusoe_ib_port_tx_wait[5m])
InfiniBand Link DownedThe rate of the InfiniBand link transitioned from an active state to a link-down state.rate(crusoe_ib_port_link_downed[5m])
InfiniBand Link Error RecoveryThe rate of the link underwent an error recovery process to attempt to restore a healthy link state.rate(crusoe_ib_port_link_error_recovery[5m])
InfiniBand Port Constraint Errors TxThe rate of errors transmitted due to link protocol or connectivity constraints, measured in packets per second.rate(crusoe_ib_port_constraint_error_tx[5m])
InfiniBand Port Constraint Errors RxThe rate of errors received due to link protocol or connectivity constraints, measured in packets per second.rate(crusoe_ib_port_constraint_error_rx[5m])
InfiniBand Port Errors RxThe rate of received packets with errors at the port level, typically indicating issues like bad Cyclic Redundancy Check (CRC) errors.rate(crusoe_ib_port_error_rx[5m])
InfiniBand Symbol ErrorsThe rate of low-level physical error bits received where the receiver detected an invalid data symbol, measured in bits per second.rate(crusoe_ib_port_bits_error[5m])