Managed Inference Metrics

Metrics are provided out of the box for models served through our Managed Inference service. These metrics are updated every minute, and are available on the metrics page. Metrics can also be integrated with Grafana dashboards via a Prometheus-compatible query API. Instructions to use the query API are provided below.

Available Metrics

Metrics	Definition	Metric Query
Request Rate	The rate of API requests served by the model	`sum by (model_name) ( rate( crusoe_inference_request_count{project_id="{project_id}", model_name="{model_name}"}[300s] ) )`
Input Token Rate	Rate of input tokens processed by the model within a given timestep	`sum by (model_name) ( rate( crusoe_inference_input_token_count{project_id="{project_id}", model_name="{model_name}"}[300s] ) )`
Output Token Count	Total number of output tokens processed by the model over rolling 24 hour periods based on the query window	`sum by (model_name) ( rate( crusoe_inference_output_token_count{project_id="{project_id}", model_name="{model_name}"}[300s] ) )`
Time to First Token (TTFT)	The time it takes for the model to generate the first token in response to a request over rolling 24 hour periods based on the query window	`histogram_quantile( 0.5, sum by (model_name, le) ( irate( crusoe_inference_histogram_first_token_latency_bucket{project_id="{project_id}", model_name="{model_name}"}[300s] ) ) )`
Time per output token (TPOT)	The time between output tokens for requests over the given time period	`histogram_quantile( 0.5, sum by (model_name, le) ( irate( crusoe_inference_histogram_output_token_latency_bucket{project_id="{project_id}", model_name="{model_name}"}[300s] ) ) )`

Each metric has labels for service_tier which are used to segment the metrics. The service_tier can be specified by pt for provisioned throughput metrics.

Retrieving metrics using the PromQL API

You can directly query the metrics API endpoint to retrieve data for a single instant or a specific time range. The API endpoint is:

https://api.crusoecloud.com/v1alpha5/projects/<project-id>/metrics/timeseries

Your project ID can be retrieved by selecting the project name in the top left corner of the Crusoe Console and copying the project ID from the appropriate project.

Generating a monitoring token

To generate the monitoring token to query metrics, run the following CLI command using the Crusoe CLI:

crusoe monitoring tokens create

This command generates an API-Key that you'll use for authentication when querying the metrics API. Please use a secret or key management tool to store the token content. You will not be able to retrieve it later.

Querying metrics

Here is an example curl command to retrieve the most recent data point for Time to First Token (TTFT) in your project:

curl -G https://api.crusoecloud.com/v1alpha5/projects/<project-id>/metrics/timeseries\?query=\
crusoe_inference_first_token_latency \
-H 'Authorization: Bearer <API-Key>'

Importing data into Grafana

To import data into your own Grafana instance, add a Prometheus data source with the following options:

Prometheus Server URL: https://api.crusoecloud.com/v1alpha5/projects/<project-id>/metrics/timeseries

Authentication → HTTP Headers:

Header: Authorization
Value: Bearer <API-Key>

Use the API-Key generated in the 'Generate a monitoring token' section.

Available Metrics​

Retrieving metrics using the PromQL API​

Generating a monitoring token​

Querying metrics​

Importing data into Grafana​

Available Metrics

Retrieving metrics using the PromQL API

Generating a monitoring token

Querying metrics

Importing data into Grafana