Skip to main content

Managed Inference Overview

Crusoe Cloud's Managed Inference service allows you to interact with supported models through our Intelligence Foundry APIs. Models are served on our proprietary inference engine with MemoryAlloy, a cluster-wide memory fabric with cache aware routing that maximizes cache hits, improving TTFT and throughput.

Available models

You can use the OpenAI-API compatible endpoint at managed-inference-api-proxy.crusoecloud.com to access the models below. You can also interact with all of the models using the Intelligence Foundry's chat interface. All Meta models provided by Crusoe are "Built with Llama".

For each model's pricing information, see pricing.

MODELPROVIDERTYPECONTEXT LENGTHLICENSEACCEPTABLE USE POLICY
deepseek-ai/DeepSeek-V3-0324DeepSeekinstruct160kMIT License
deepseek-ai/DeepSeek-V4-FlashDeepSeekinstruct1MMIT License
deepseek-ai/DeepSeek-V4-ProDeepSeekinstruct1MMIT License
google/gemma-4-31b-itGoogleinstruct262kApache License 2.0
meta-llama/Llama-3.3-70B-InstructMetainstruct128kLlama 3.3 Community License AgreementLlama 3.3 Acceptable Use Policy
nvidia/Nemotron-3-Nano-30B-A3BNVIDIAinstruct262kNVIDIA Nemotron Open Model LicenseNVIDIA Acceptable Use Terms
nvidia/Nemotron-3-Nano-Omni-Reasoning-30B-A3BNVIDIAinstruct262kNVIDIA Open Model Agreement
nvidia/Nemotron-3-Super-120B-A12BNVIDIAinstruct262kNVIDIA Nemotron Open Model LicenseNVIDIA Acceptable Use Terms
nvidia/Nemotron-3-VoiceChatNVIDIAspeech-to-speech131kNVIDIA Software and Model Evaluation LicenseNVIDIA Acceptable Use Terms
openai/gpt-oss-120bOpenAIinstruct128kApache License 2.0Acceptable Use Policy
qwen/Qwen3-235B-A22BQweninstruct131kApache License 2.0Apache License 2.0
zai/GLM-5.1Z.aiinstruct202kMIT License