Hosted AI Models

Call open-weight AI models over a simple API and pay only for the tokens you use—30% of your spend builds credit toward owning a GPU

Available Models

Pricing per token is coming soon. for early access.

Model Type Parameters Pricing
GLM-5 new Coding / Enterprise Agents 744B MoE (44B active) TBD
Kimi K2.5 new Agentic / Visual Coding 1T MoE (32B active) TBD
Qwen3.5-122B-A10B new Reasoning / Multilingual 122B MoE (10B active) TBD
Qwen3.5-35B-A3B new Text / Reasoning 35B MoE (3B active) TBD
DeepSeek V3.2 new Coding / Reasoning Agent 671B MoE (37B active) TBD
Qwen3-VL 32B new Vision (VLM) 32B TBD
Qwen3-VL 8B new Vision (VLM) 8B TBD
Mistral Small 3.1 Vision + Text 24B TBD
Qwen 2.5 Coder Coding Agent 32B TBD
Whisper Large v3 Speech Recognition (ASR) 1.5B TBD

Frequently Asked Questions

As you use our model API, 30% of your spend accrues as credit. Those credits can be applied directly to the down payment or monthly installments of any GPU in our Lease-to-Own program. You are essentially pre-paying for hardware ownership while you run your AI workloads. If your credits cover the full cost—the GPU is yours, and we ship it to your door.

Your credits are tracked automatically. When you are ready, pick any GPU from our Lease-to-Own catalog. Credits apply to your down payment or monthly installments—shortening your term and lowering your monthly obligation. No paperwork, no expiration.

We are finalising per-token pricing now. The models listed—covering text, vision, coding, reasoning, and speech—are available for running in our infrastructure. Pricing will be published as soon as it is confirmed. if you want early access or volume pricing.

MoE (Mixture of Experts) models have a large total parameter count but only activate a small fraction per token during inference. For example, GLM-5 has 744B total parameters but only 44B are active at any given time. In practice this means you get the quality of a massive model at the compute cost of a much smaller one—fast responses, lower latency, and better value per token.