Sending every query to a frontier model is how AI budgets evaporate. RouteIQ measures how closely each query aligns with the project's documentation, then routes it to the cheapest tier that will do the job — premium only where it earns its cost, with a hard budget ceiling the system enforces on its own.

Three model tiers

Each project defines its own model pool, organised by cost tier. The router only ever picks within the tier the relevance score and budget allow.

Premium

High-relevance, complex, project-critical work.

Typical models: Opus 4.8, GPT-4o.
Approx. cost: $0.015–$0.060 per query.

Mid

Medium-relevance or ambiguous queries.

Typical models: Sonnet 4.6, GPT-4o-mini.
Approx. cost: $0.003–$0.010 per query.

Cheap

Off-topic queries and simple in-scope work.

Typical models: Qwen-7B, Haiku 4.5, Nova Lite.
Approx. cost: $0.0005–$0.002 per query.

Selection strategy

Within a tier, route by the admin's chosen lever.

Cheapest — lowest cost per input token.
Lowest latency — best p50.
Round-robin & availability-aware failover.

The stack

RouteIQ is model-agnostic by design, spanning managed and self-hosted inference so you inherit better models without re-architecting.

Adapters. Amazon Bedrock (primarily Claude models and Nova Lite) and a vLLM adapter on SageMaker hosting open-weight models (Qwen2.5-7B, Qwen3-8B, Llama-3.1-8B).
Embeddings. Amazon Titan; project documentation is embedded and refreshed on a weekly cadence.
Short-term memory. The last few conversation turns per session inform relevance scoring (in-process, no persistence layer).
Outcome logging. Query, model choice, cost, and outcome signals captured with a schema designed for future capability profiles.

Phased rollout

Phase 1 delivers project-policy-based routing — the project registry, identity mapping, hybrid relevance scoring with percentile-calibrated thresholds, three-tier routing, budget tracking with auto-downgrade, per-user caps, retry escalation, and comprehensive logging. Phase 2 adds complexity-aware model selection (query complexity plus benchmark scores) and learned, per-model capability profiles, plus calibrated confidence scoring.

Where it fits

RouteIQ sits in front of your agents and LLM apps as the routing and cost-governance layer. It is governed by Responsible AI controls, its routing quality is measured by AI Eval Service, and it pairs naturally with the GenAI Delivery Factory for production rollout.

Related resources

RouteIQ is one of six production-ready accelerators we run in the open. See it routing live — and explore the full accelerator suite →

We are an intent away

RouteIQ — Intelligent model routing. Every query to the right model, at the right cost.