The pipeline
Each stage is a pluggable component. We swap implementations to fit the customer's document mix, volume, and compliance constraints.
- Ingestion. Documents arrive from anywhere — monitored email inbox, REST API, SFTP drop, scanner, or web upload. PDFs, images, scans, and office files are normalised to a common internal format.
- Classification. An agent identifies the document type (invoice, purchase order, contract, bank statement, ID document…) and routes it to the right extraction schema. Unknown types go to triage, not the bin.
- OCR & layout analysis. Amazon Textract, Azure Document Intelligence, Google Document AI, or self-hosted docTR — recovering text, tables, key-value pairs, and reading order from clean PDFs through to crumpled scans.
- Schema-constrained extraction. An LLM (Claude or GPT, or an open-weight model on-prem) maps the recognised content onto a defined target schema — constrained to the fields, types, and enums the schema allows. The model cannot invent a field.
- Validation. Business rules run on every record: arithmetic checks (line items sum to total), format checks (GSTIN / PAN / IBAN patterns), cross-references against master data, and per-field confidence thresholds.
- Human-in-the-loop exceptions. Only records that fail validation or fall below confidence land in a review queue, side-by-side with the source page. Reviewers correct, approve, and that feedback sharpens the schema and prompts.
- Structured output. Clean, validated records are pushed to the systems that need them — ERP, accounting, CRM, data warehouse — as JSON, via API, or as a governed table on your Data Platform.
Why “human controls the exceptions” matters
Fully manual processing does not scale; fully automatic processing is not trustworthy on documents that move money or carry legal weight. Quillect's design point is straight automation on the confident majority, and human judgement concentrated where it earns its keep.
- Confidence-gated automation. High-confidence, validation-clean records flow straight through. Reviewers never see them.
- Exception queue, not inbox. Only ambiguous or failing records reach a human, ranked by value and risk — so attention goes where it matters.
- Every decision is auditable. Source page, extracted values, confidence, which rule fired, who approved — retained for every record. Governed by Responsible AI controls.
- The system learns. Reviewer corrections feed back into the schema and prompts, so the share routed to humans falls over time — measured, not assumed, with AI Eval Service.
What it does well — and what it doesn't
Quillect is honest about its operating envelope.
It does well
- High-volume, repeating document types — invoices, POs, remittances, statements, forms.
- Mixed-quality inputs — clean digital PDFs through to photographed and scanned pages.
- Table and line-item extraction with arithmetic validation.
- Straight-through processing of the confident majority, with measured exception rates.
It doesn't pretend to do
- Legal interpretation — it extracts contract terms; it does not advise on them.
- Decisions on low-confidence records — those route to a human by design.
- One-off, never-seen-again document types where building a schema costs more than reading by hand.
- Anything involving sensitive personal data without the proper authorisation and residency context.
How it compares to single-vendor IDP
The cloud vendors ship capable document suites — Amazon Textract, Azure Document Intelligence, Google Document AI. They are strong OCR and form engines. Quillect uses them as components rather than as the whole stack.
Quillect is for organisations that want:
- BYOM (bring-your-own-model) — choose Claude, GPT, or open-weight models per document type, including on-prem deployment for regulated workloads.
- Vendor-agnostic OCR — the best recogniser per document type, not whichever one ships with the cloud you happen to use.
- Data-residency control — OCR, extraction, and storage can run in customer-controlled regions for DPDP and other regulated workloads.
- Governed human review — exception handling that integrates with your identity, audit, and approval flows rather than a SaaS vendor's console.
Where it fits
Quillect is the document-to-data front door for the rest of the stack. Extracted records land on the Data Platform we build, feed AI-Powered BI and downstream agents, are governed by Responsible AI controls, and have their extraction quality assured by AI Eval Service. Start with a GenAI Readiness Assessment to scope your document mix and volumes.
Related resources
Quillect is one of six production-ready accelerators we run. See document AI live — and explore the full accelerator suite →