Skip to main content

Managed Inference Metrics

Metrics are provided out of the box for models served through our Managed Inference service. These metrics are updated every minute, and are available on the metrics page. Metrics can also be integrated with Grafana dashboards via a Prometheus-compatible query API. Instructions to use the query API are provided below.

Available Metrics

MetricsDefinitionMetric Query
Request RateThe rate of API requests served by the modelsum by (model_name) ( rate( crusoe_inference_request_count{project_id="{project_id}", model_name="{model_name}"}[300s] ) )
Input Token RateRate of input tokens processed by the model within a given timestepsum by (model_name) ( rate( crusoe_inference_input_token_count{project_id="{project_id}", model_name="{model_name}"}[300s] ) )
Output Token CountTotal number of output tokens processed by the model over rolling 24 hour periods based on the query windowsum by (model_name) ( rate( crusoe_inference_output_token_count{project_id="{project_id}", model_name="{model_name}"}[300s] ) )
Time to First Token (TTFT)The time it takes for the model to generate the first token in response to a request over rolling 24 hour periods based on the query windowhistogram_quantile( 0.5, sum by (model_name, le) ( irate( crusoe_inference_histogram_first_token_latency_bucket{project_id="{project_id}", model_name="{model_name}"}[300s] ) ) )
Time per output token (TPOT)The time between output tokens for requests over the given time periodhistogram_quantile( 0.5, sum by (model_name, le) ( irate( crusoe_inference_histogram_output_token_latency_bucket{project_id="{project_id}", model_name="{model_name}"}[300s] ) ) )

Each metric has labels for service_tier which are used to segment the metrics. The service_tier can be specified by pt for provisioned throughput metrics.

Retrieving metrics using the PromQL API

You can directly query the metrics API endpoint to retrieve data for a single instant or a specific time range. The API endpoint is:

https://api.crusoecloud.com/v1alpha5/projects/<project-id>/metrics/timeseries

Your project ID can be retrieved by selecting the project name in the top left corner of the Crusoe Console and copying the project ID from the appropriate project.

Generating a monitoring token

To generate the monitoring token to query metrics, run the following CLI command using the Crusoe CLI:

crusoe monitoring tokens create

This command generates an API-Key that you'll use for authentication when querying the metrics API. Please use a secret or key management tool to store the token content. You will not be able to retrieve it later.

Querying metrics

Here is an example curl command to retrieve the most recent data point for Time to First Token (TTFT) in your project:

curl -G https://api.crusoecloud.com/v1alpha5/projects/<project-id>/metrics/timeseries\?query=\
crusoe_inference_first_token_latency \
-H 'Authorization: Bearer <API-Key>'

Importing data into Grafana

To import data into your own Grafana instance, add a Prometheus data source with the following options:

Prometheus Server URL: https://api.crusoecloud.com/v1alpha5/projects/<project-id>/metrics/timeseries

Authentication → HTTP Headers:

Header: Authorization
Value: Bearer <API-Key>

Use the API-Key generated in the 'Generate a monitoring token' section.