The agent team
Four roles, working in sequence. Each agent owns a phase; each phase produces an artefact the next agent consumes.
Scanning Agent
Connects to the source data platform — relational, NoSQL, warehouse — and introspects everything: schemas, tables, columns, types, indexes, FKs, views, stored procedures, triggers, sequences, permissions, lineage. Output: a rich metadata graph.
Evaluation Agent
Reads the metadata graph and grades it against data-platform best practices. Detects cyclic dependencies, redundant objects, orphaned data, dead procedures, type-precision risks, and naming conflicts. Output: a scored issue inventory.
Data Agent
Applies platform best practices to resolve the identified issues, generates dialect translations where needed, and produces a sequenced step-by-step migration routine with risk levels, validation gates, and rollback paths. Output: a detailed work-breakdown plan.
DBA (human)
Reviews the plan, approves or amends each step, and runs execution under the customer's compliance and security guidelines. The DBA is final authority. Agents assist; the DBA decides.
The metadata graph
The Scanning Agent builds a typed, queryable graph of the source system. It is the substrate every later step works on.
What's in the graph
- Schema objects — tables, columns with types, defaults, nullability, check constraints; views, materialised views, sequences, custom types.
- Programmability — stored procedures, functions, triggers, packages (Oracle), assemblies (SQL Server), scheduled jobs.
- Relationships — foreign keys, view dependencies, procedure-to-table call edges, trigger fan-out.
- Performance shape — indexes (clustered, covering, filtered), partition strategy, row counts, table sizes, hot tables from execution-plan history.
- Security & access — roles, grants, row-level security, masking policies.
How the graph is built
- System catalogues —
information_schema,pg_catalog,sys.*(SQL Server),ALL_/USER_/DBA_*(Oracle),INFORMATION_SCHEMA(Snowflake / BigQuery / Databricks). - Static analysis of code — AST-based parsing of stored procedures and triggers to extract call graphs and dependencies LLMs alone miss.
- Profiling sample — row counts, NULL ratios, distinct counts, value distributions on configurable sample sizes (read-only, low-impact).
- Lineage — ingested from existing tools (OpenLineage, dbt manifests, informatica metadata) where available.
What the framework resolves
Real data platforms accumulate decades of decisions. The Evaluation Agent surfaces them; the Data Agent proposes resolutions before they become migration blockers.
- Cyclic dependencies — FK loops between tables, view-on-view chains, procedure-call cycles. The agent identifies the cycle, proposes a break point, and re-orders dependent objects in the work-breakdown.
- Redundant objects — duplicate indexes, unused tables, dead stored procedures (no callers in 90+ days), orphan columns, denormalised duplicates of reference tables. Flagged for retire-before-migrate.
- Type and precision incompatibilities — Oracle
NUMBERwithout precision, SQL ServerDATETIMEOFFSET, Postgres-only types. Each gets a documented translation with a confidence score. - Index gaps and bloat — missing indexes on frequent query paths, redundant indexes consuming write budget, fragmented indexes. The plan separates "carry as-is" from "rationalise during migration".
- Orphan data and FK violations — rows with foreign keys to non-existent parents (yes, this happens). The agent quantifies and proposes a clean-up path.
- Encoding and collation mismatches — Latin-1 columns inside otherwise-UTF-8 schemas, case-sensitivity collation drift. Surfaced before they break string comparisons in the target.
- Permissions and role drift — grants accumulated to users no longer with the company, role hierarchies that no one remembers. The plan documents what carries forward and what doesn't.
Migration patterns supported
Heterogeneous (engine change)
- Oracle → PostgreSQL / Aurora PostgreSQL
- SQL Server → PostgreSQL / Aurora / MySQL
- Oracle → Snowflake / Databricks (analytical re-platform)
- SAP HANA → Snowflake / Databricks
Homogeneous (lift & shift)
- On-prem Oracle / SQL Server / PostgreSQL → cloud-managed equivalents (RDS, Aurora, Cloud SQL, Azure SQL).
- Major-version upgrades with deprecated-feature audit.
SQL → NoSQL (selective)
- Suitability assessment first — not every relational schema fits NoSQL.
- Access-pattern-driven design for DynamoDB, Cassandra.
- Document modelling for MongoDB.
Data warehouse migrations
- Teradata / Netezza → Snowflake / BigQuery / Databricks.
- Includes BI-layer redirection (Tableau, Power BI, Looker).
Compliance & security at execution
Migrations are also compliance events. The DBA executes the plan within the customer's regulatory and security framework; RekonAIDe surfaces what each step touches so the controls travel with the work.
- Read-only scanning — the Scanning Agent never writes to the source; permissions are scoped to
SELECTon system catalogues plus optional sample reads. - PII detection in the graph — sensitive columns (PAN, Aadhaar, email, phone) flagged automatically; masking and tokenisation gates added to the work-breakdown.
- Regulatory mapping — DPDP Act 2023 and GDPR alignment captured in the plan; data-residency constraints respected (Indian regions for DPDP-aligned workloads, etc.).
- Audit trail — every agent action, every artefact, every approval logged via Eval@Core. Compliance teams can replay any decision.
- Customer guidelines as code — encryption-at-rest, key rotation, network isolation, change-management approvals encoded as plan-level checks.
Where it fits
RekonAIDe sits under the Data Platform practice and ships through the GenAI Delivery Factory, with the same human-in-the-loop discipline applied to all our AI-assisted work. The infrastructure underneath is what we describe in Infrastructure & Engineering.