Skip to main content

Telemetry Relay

note

Telemetry Relay is currently in limited availability. Contact your Crusoe account team if you have any questions or would like to enable this feature for your project.

Telemetry Relay enables you to export infrastructure and custom metrics from your CMK cluster or VMs to external observability platforms. You can integrate Crusoe infrastructure data into your existing Grafana, Datadog, or Splunk dashboards without managing separate data pipelines.

Telemetry Relay exposes a Prometheus-compatible scraping endpoint for external tools and is available for Crusoe Managed Kubernetes (CMK) and Crusoe Virtual Machines (VMs).

How it Works

Telemetry Relay uses the same metric collection infrastructure as Command Center Metrics:

  1. The Crusoe Watch Agent collects metrics from CMK nodes or VMs at 60-second intervals.
  2. Metrics are published to the Crusoe metrics backend.
  3. Telemetry Relay exposes a Prometheus-compatible scraping endpoint.
  4. Your external platform scrapes the endpoint to retrieve metrics.

Prerequisites

To use Telemetry Relay, you need:

  • CMK cluster with Crusoe Watch Agent installed (see Metrics)
  • External observability platform that supports Prometheus remote read or scraping

Available Metrics

You can export any infrastructure and custom metrics collected by the Crusoe Watch Agent, including GPU (DCGM), CPU, memory, network, InfiniBand, and NVLink metrics. See Infrastructure Metrics for the complete list.

Configuring Telemetry Relay

Endpoint

Use the following endpoint to access your metrics:

https://api.crusoecloud.com/v1alpha5/projects/<project-id>/metrics/scrape

Authentication

You need a monitoring token to authenticate requests. Generate one using the Crusoe CLI. See Querying Metrics via API for instructions.

Connecting to Grafana

To connect Grafana to Telemetry Relay:

  1. In Grafana, navigate to Configuration > Data Sources > Add data source.

  2. Select Prometheus.

  3. Set the URL to:

    https://api.crusoecloud.com/v1alpha5/projects/<project-id>/metrics/scrape
  4. Under Custom HTTP Headers, add:

    • Header: Authorization
    • Value: Bearer <monitoring-token>
  5. Set the Scrape interval to a minimum of 60 seconds.

  6. Click Save & Test.

You can now build dashboards using the available infrastructure metrics.

Connecting to Datadog

To connect Datadog to Telemetry Relay:

  1. Add a Prometheus check to your Datadog Agent configuration:

    instances:
    - prometheus_url: "https://api.crusoecloud.com/api/v1alpha5/projects/<project-id>/metrics/scrape"
    namespace: "crusoe"
    metrics:
    - "*"
    headers:
    Authorization: "Bearer <monitoring-token>"

    Replace the following placeholders:

    • <project-id>: Your Crusoe project ID (find via crusoe projects list)
    • <monitoring-token>: Generate with crusoe monitoring tokens create
  2. Restart the Datadog Agent. Metrics will appear in Datadog under the configured namespace.

Connecting to Splunk

To connect Splunk to Telemetry Relay, configure your OpenTelemetry Collector with Crusoe's scrape endpoint. Below is an example setting:

receivers:
prometheus:
config:
scrape_configs:
- job_name: "crusoe-metrics"
scrape_interval: 60s
scrape_timeout: 10s
scheme: https

authorization:
type: Bearer
credentials: <crusoe-monitoring-token>

static_configs:
- targets: ["api.crusoecloud.com"]

metrics_path: "/api/v1alpha5/projects/<project-id>/metrics/scrape"

processors:
transform:
metric_statements:
- context: datapoint
statements:
- delete_key(attributes, "crusoe_resource")

batch:
timeout: 10s
send_batch_size: 1000

exporters:
otlphttp:
metrics_endpoint: "https://ingest.<splunk-realm>.signalfx.com/v2/datapoint/otlp"
headers:
X-SF-Token: "<splunk-access-token>"

service:
pipelines:
metrics:
receivers: [prometheus]
processors: [transform, batch]
exporters: [otlphttp]

Replace the following placeholders:

  • <crusoe-monitoring-token>: Generate with crusoe monitoring tokens create
  • <project-id>: Your Crusoe project ID (find via crusoe projects list)
  • <splunk-access-token>: Your Splunk Observability Cloud access token
  • <splunk-realm>: Your Splunk realm (e.g., us1, us2, eu0)

Restart is required. Metrics will appear in Splunk Observability Cloud under Metrics → Metric Finder. Search for crusoe_ to find your Crusoe metrics.

Connecting to Other Prometheus-Compatible Platforms

You can connect any Prometheus-compatible platform to Telemetry Relay using the scrape endpoint.

Configure your platform with:

  • Endpoint URL: https://api.crusoecloud.com/api/v1alpha5/projects/<project-id>/metrics/scrape
  • Authentication: Bearer token via Authorization header
  • Scrape interval: Minimum 60 seconds

Replace the following placeholders:

  • <project-id>: Your Crusoe project ID (find via crusoe projects list)
  • Generate a monitoring token with crusoe monitoring tokens create

Filtering Metrics

All platforms support filtering metrics by adding query parameters to the scrape endpoint URL. This allows you to reduce the volume of metrics exported and focus on specific data.

Available Filters

ParameterDescriptionExample
metric_nameFilter by metric name (comma-separated list)metric_name=crusoe_vm_memory_.*
labelsFilter by label key:value pairs (comma-separated)labels=collector:disk,device:vda
metric_categoryFilter by category (system or custom)metric_category=system

Filter Examples

Filter by memory-related metrics:

https://api.crusoecloud.com/api/v1alpha5/projects/<project-id>/metrics/scrape?metric_name=crusoe_vm_memory_.*

Filter by labels:

https://api.crusoecloud.com/api/v1alpha5/projects/<project-id>/metrics/scrape?labels=collector:disk

Filter by metric category (system metrics only):

https://api.crusoecloud.com/api/v1alpha5/projects/<project-id>/metrics/scrape?metric_category=system

Combined filters for disk metrics on device vda1:

https://api.crusoecloud.com/api/v1alpha5/projects/<project-id>/metrics/scrape?metric_name=crusoe_vm_disk_.*&labels=device:vda1

Platform-Specific Examples

Datadog:

instances:
- prometheus_url: "https://api.crusoecloud.com/api/v1alpha5/projects/<project-id>/metrics/scrape?metric_name=crusoe_vm_memory_.*"
namespace: "crusoe"
metrics:
- "*"
headers:
Authorization: "Bearer <monitoring-token>"

Splunk OpenTelemetry Collector:

metrics_path: "/api/v1alpha5/projects/<project-id>/metrics/scrape?labels=collector:disk"

Grafana or other Prometheus-compatible platforms:

Add query parameters directly to the configured endpoint URL.

Limitations

  • Metrics only — Log streaming is planned for a future release.
  • Minimum scrape interval — 60 seconds.
  • Metrics retention — Metrics are retained for 30 days on the Crusoe backend. External platform retention is governed by your platform's policies.

What's Next

  • Metrics — View metrics directly in the Crusoe Console
  • Logs — Access centralized log data
  • Notifications — Get notified about resource health via email and in-console, and set up alert routing to Slack or webhooks