Agentic AI in CAD-driven AEC: from Revit elements to Monte Carlo risk — and the guardrails that keep it honest
In the last eighteen months, every architect, engineer and contractor I've spoken to has asked a version of the same question: "Where does AI actually pay off in our workflows — and how do we stop it making things up?"
This is the right question. The wrong one is "should we use AI". The wrong answer is "let's bolt a chatbot onto Revit". What's emerging — and what we ship to clients today — is something more interesting: agentic AI that orchestrates an entire CAD-driven stack. It doesn't replace the engineer. It doesn't replace the planner. It steers a hybrid of deterministic rules, classical machine learning, and reasoning LLMs at the messy parts where a human used to spend three days a week.
This article is about where that actually wins, where it fails, and the architectural pattern that decides which.
The wins are concrete
01Monte Carlo risk analysis driven directly off the model
Traditional Monte Carlo simulations on a construction programme require someone to (a) gather all the input distributions, (b) keep them current as the design evolves, and (c) maintain the model itself. In practice, simulations are run once at tender stage and never again.
When you connect an agent to the federated Revit model, the supplier feed, and the historical lead-time database, that changes. The agent re-reads the model overnight, refreshes the distributions for every supplier and every activity, runs 10,000 trials against the live critical path, and surfaces:
- P50 / P80 / P95 completion dates per phase
- Per-supplier slip contribution to the overall programme
- The activities most worth de-risking — highest variance × highest float-loss
The simulation kernel is just Python. The AI's contribution is keeping the inputs fresh and reasoning about which assumptions to challenge — "the curtain-wall supplier's lead-time variance has doubled in the last six weeks; should we re-weight the south-façade tail?" This wasn't possible before because nobody had time to maintain the inputs. Now an agent does it nightly.
02Classifying Revit elements — position, properties, supplier data, BOQ
Revit elements arrive with native properties: type, family, dimensions, level, host. They don't arrive with context. Is this beam on the critical span? Is this wall on the south façade? Is this column carrying the atrium roof? We classify every element through a hybrid pipeline:
- Deterministic geometric rules — perimeter vs interior (computed from outer ring), zone (boundary functions), grid intersection (snap to nearest grid line), façade orientation (azimuth from face normal), span-length category.
- Native Revit properties — type, family, finish, fire rating, structural usage, custom parameters.
- External joins — supplier lead time, stock status, unit price, sustainability data, certification status.
- LLM review — for the ~5% of elements where the rules are ambiguous (e.g. "is this hybrid beam-column STR_BEAM or STR_COL?"), the model reads the context and commits a classification with citation to the geometric rule and the source property.
The output is a single materialised view: every element with its derived classes, its supplier, its cost, and its tie-back to the Bill of Quantities. That last point matters.
An element classified as STR_BEAM · UB 305×165 · L2 · Span ≥ 8 m · Severfield · stock automatically maps to a BOQ line.The cost engineer stops manually reconciling drawings to the bill. Procurement stops chasing what the structural engineer changed last week. The programme refers to elements, not abstract activities.
03Critical path that reflows on every change
Static MS Project files are obsolete a week after they're saved. When you classify every element, join the supplier data, and tie elements to activities, the critical path becomes a function:
CP = solve(activities × dependencies × supplier_lead_times × float)
Re-run it nightly. Re-run it on every supplier feed update. Re-run it on every design change. The PMO dashboard goes from a quarterly artefact to a live system. When the curtain-wall supplier flags a delay, the affected activities, the days of remaining float, and the costed mitigation options surface in minutes — not the day before despatch.
04Quantity take-off and BOQ generation
Every element classified → every element costed → every change re-priced. The agent doesn't replace the quantity surveyor; it gives them a live, model-grounded BOQ they can validate and adjust. A change order goes from "we'll get back to you in three days" to "here's the priced delta with citations to which elements changed".
Where it fails
These wins don't come from typing prompts into a chatbot. The naive approaches I see most often:
- Single big LLM for everything. The model is asked to classify, retrieve, reason, and write back. It hallucinates element classes, fabricates supplier names, and burns tokens at scale. When something goes wrong, you can't tell which step failed.
- No citations. The model returns a confident answer with no traceable source. A junior engineer takes it as truth. Three weeks later the error compounds into a programme slip.
- Free-text everywhere. The model returns prose; downstream code parses it with regex; the regex breaks on every model update.
- No structured outputs, no schemas. The model returns "24 panels approximately" instead of
{count: 24, unit: "panels"}. Half your codebase becomes a parser. - Training on client data without consent. A model fine-tuned on confidential designs leaks them — or trips GDPR / IP clauses you didn't notice.
- Auto-commit without human review for high-impact actions. The agent reclassifies 200 elements and updates the BOQ silently. Nobody sees it until the procurement order goes wrong.
The architecture that keeps it honest
Six principles we apply on every engagement.
01Hybrid stack: rules + ML + LLM
Deterministic SQL and rules narrow the haystack — from 10 million rows to ten thousand. ML scores and ranks — top forty-seven candidates. The LLM reasons over a few dozen items at the end. The deterministic plumbing is cheap, fast, and predictable. The ML layer brings pattern recognition. The LLM brings synthesis. The right tool per task, never a hammer for every nail.
02RAG with citations
The LLM doesn't recall facts from training. It retrieves from your data — your model, your supplier feed, your BOQ, your historical projects — and cites the row, document, or chunk every claim came from. If the data isn't there, the agent says so. There's no fabrication because there's nothing to fabricate from.
03Two-pass cost optimisation
A cheap model triages every incoming request (email, RFI, change order, query) — intent, sentiment, urgency, classification. Only the ~20% that need deep reasoning go to the expensive model. Per-inbound cost: pennies. Accuracy: indistinguishable from single-pass.
04Structured outputs and function calling
JSON schemas enforced at the model boundary. The model returns tool invocations, not prose. Downstream code never parses free-text. Validation runs before commit. Schema-shaped data is composable; prose isn't.
05Hallucination guards
Confidence thresholds on every output. Citation requirement for every claim. Validation against the source data before write-back. Low-confidence outputs route to a human for review — never auto-commit. Drift detection on output distributions surfaces the moment the model starts behaving differently.
06AI as orchestrator, not replacement
The LLM steers the stack. It chooses which deterministic rule to apply, which ML model to consult, which dataset to query, which tool to invoke. It reasons over outputs and decides what to do next. It does not replace the deterministic plumbing underneath. The plumbing does the heavy lifting; the reasoning steers it.
Data governance — the part people skip
A few non-negotiables.
- Never train on client data without a written, scoped agreement. Most production work doesn't require fine-tuning at all — RAG and few-shot are usually enough.
- All retrieval is per-tenant scoped. An agent acting for Client A cannot retrieve from Client B's index. Ever.
- Audit logs on every read and every write. What the agent saw, what it decided, what it wrote, when, with which model version.
- Reversibility. Every action the agent commits should be undoable. Drafts before publishes. Stage before commit. The factory floor and the site should never trust an AI action that can't be rolled back.
- Right-to-deletion. When a client offboards, every embedding, every cached output, every fine-tuned adapter goes with them. Not in theory — in production, with verification.
What this means for AEC clients
You don't need to retrain your team to use AI. You don't need to migrate Revit. You don't need to abandon your existing software stack.
What you need is the layer that connects them — the agentic pipeline that reads your model, joins your supplier data, applies your rules, runs your simulations, and surfaces the answers your project director, your QS, and your site supervisor actually want, on demand.
When it's built properly — with the hybrid architecture, the citations, the guardrails, the governance — it works. Quietly. Every night. On every change. Reducing the amount of repetitive, error-prone work your senior people are still doing by hand.
When it's built badly, it hallucinates a beam in the wrong place and someone catches it in a meeting. Both outcomes are happening in the industry right now. The difference is the architecture.
Leveret Systems
We build production AI agents and CAD-driven custom software for AEC and manufacturing. Talk to us if you have a Revit model, a supplier feed, and a programme that could be working harder for you.
Start a conversation →