/ ACCELERATOR· RouteIQ

RouteIQ — Intelligent model routing. Every query to the right model, at the right cost.

RouteIQ scores how relevant each query is to your project and routes it to the right model tier — cheap, mid, or premium — with per-team budgets, burn-rate downgrades, and full cost logging.

A project-policy router across Amazon Bedrock (Claude, Nova) and vLLM-hosted open-weight models (Qwen, Llama). Hybrid relevance scoring, three-tier routing, budget enforcement, and outcome logging — governed, model-agnostic, and built for the enterprise.

Sending every query to a frontier model is how AI budgets evaporate. RouteIQ measures how closely each query aligns with the project's documentation, then routes it to the cheapest tier that will do the job — premium only where it earns its cost, with a hard budget ceiling the system enforces on its own.

How routing decides

Every query arrives with the user's identity, which maps to a team/project in the registry. From there RouteIQ runs a deterministic, auditable decision before any expensive model is called.

  1. Identity → project. The user maps to a default project, which carries its own allowed models, budget, documentation, and routing rules.
  2. Project-relevance score. A hybrid signal combines semantic similarity (cosine distance of embeddings) with lexical overlap (keyword/term match) against the project docs plus the last few conversation turns — robust to both paraphrasing and project-specific vocabulary.
  3. On-topic vs off-topic. On-topic queries go to mid- or premium-tier models; off-topic queries are routed to cheaper models. The cut-off is a per-project percentile-calibrated threshold (an optional Bayesian confidence score is on the roadmap).
  4. Ambiguous band. When relevance lands in an uncertain region, a lightweight classifier (Nova Lite) makes the final call rather than defaulting to the expensive tier.
  5. Model selection within the tier. The admin chooses the strategy — cheapest, lowest latency, round-robin, or availability-aware (skip recently failed models). Complexity-based selection arrives in Phase 2.
  6. Log everything. The query, model choice, cost, and outcome signals are recorded — with a schema designed to feed future capability profiles.

Three model tiers

Each project defines its own model pool, organised by cost tier. The router only ever picks within the tier the relevance score and budget allow.

Premium

High-relevance, complex, project-critical work.

  • Typical models: Opus 4.8, GPT-4o.
  • Approx. cost: $0.015–$0.060 per query.

Mid

Medium-relevance or ambiguous queries.

  • Typical models: Sonnet 4.6, GPT-4o-mini.
  • Approx. cost: $0.003–$0.010 per query.

Cheap

Off-topic queries and simple in-scope work.

  • Typical models: Qwen-7B, Haiku 4.5, Nova Lite.
  • Approx. cost: $0.0005–$0.002 per query.

Selection strategy

Within a tier, route by the admin's chosen lever.

  • Cheapest — lowest cost per input token.
  • Lowest latency — best p50.
  • Round-robin & availability-aware failover.

Budget & cost governance

RouteIQ watches spend continuously and acts before the bill does. Team-level monthly budgets are tracked with burn-rate anomaly detection, and routing degrades gracefully as thresholds are crossed.

Project policy rules

Each project in the registry exposes a set of levers to enterprise admins — routing behaviour is configuration, not code.

The stack

RouteIQ is model-agnostic by design, spanning managed and self-hosted inference so you inherit better models without re-architecting.

Phased rollout

Phase 1 delivers project-policy-based routing — the project registry, identity mapping, hybrid relevance scoring with percentile-calibrated thresholds, three-tier routing, budget tracking with auto-downgrade, per-user caps, retry escalation, and comprehensive logging. Phase 2 adds complexity-aware model selection (query complexity plus benchmark scores) and learned, per-model capability profiles, plus calibrated confidence scoring.

Where it fits

RouteIQ sits in front of your agents and LLM apps as the routing and cost-governance layer. It is governed by Responsible AI controls, its routing quality is measured by AI Eval Service, and it pairs naturally with the GenAI Delivery Factory for production rollout.

Related resources

We are an intent away