Skip to main content

Getting Started with Managed Inference

Crusoe's Managed Inference Service provides OpenAI compatible endpoints for a number of popular open source models. The models are hosted on Crusoe's inference engine with MemoryAlloy, a proprietary cluster-wide memory fabric with cache-aware routing that improves TTFT and throughput.

The instructions below provide instructions to start querying models via the OpenAI SDK. All models are accessible via the api.crusoe.ai path.

Retrieving your Inference API token

You can retrieve your Inference API token via the Crusoe Cloud console by following the steps below.

- Visit the Intelligence Foundry on the [Crusoe Cloud console](https://console.crusoecloud.com/foundry/models)
- From either the models or chat tab, use the "Get API Key" button to generate API key
- Provide an optional alias and expiration date
- Click "Create" to view and save your API key

Querying Text models

After retrieving an API key from the Intelligence Foundry, you can use the OpenAI SDK to make requests. The example below uses the meta-llama/Llama-3.3-70B-Instruct model.

import os
from openai import OpenAI

CRUSOE_API_KEY = os.getenv("CRUSOE_API_KEY")
client = OpenAI(
api_key=CRUSOE_API_KEY,
base_url="https://api.crusoe.ai/v1",
)

completion = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful, concise assistant."},
{"role": "user", "content": "Who is Robinson Crusoe?"},
],
)

print(completion.choices[0].message.content)