← leveretsystems.co.ukAI engineeringAEC16 May 2026 · 9 min read

Agentic AI in CAD-driven AEC: from Revit elements to Monte Carlo risk — and the guardrails that keep it honest

In the last eighteen months, every architect, engineer and contractor I've spoken to has asked a version of the same question: "Where does AI actually pay off in our workflows — and how do we stop it making things up?"

CAD · FEDERATED MODELmodel.federatedSTR_BEAM · L24,203 elements · 38 derived classesAGENTIC PIPELINE · HYBRIDL1 · DETERMINISTICSQL · rules · validators · scope guardsL2 · ML / STATISTICSXGBoost · embeddings · rankingL3 · LLM REASONINGRAG · function calling · cited outputDECISIONS · STRUCTUREDlive.outputsBILL OF QUANTITIES440+ items · live costCRITICAL PATH84 d · reflowed nightlyRISK · MONTE CARLOP80 = +6 d · 3 driversQUOTE DRAFT30s · cited line items
Fig. 01From the federated CAD model on the left, through the hybrid agentic pipeline, to structured business decisions on the right — the shape of every project we ship.

This is the right question. The wrong one is "should we use AI". The wrong answer is "let's bolt a chatbot onto Revit". What's emerging — and what we ship to clients today — is something more interesting: agentic AI that orchestrates an entire CAD-driven stack. It doesn't replace the engineer. It doesn't replace the planner. It steers a hybrid of deterministic rules, classical machine learning, and reasoning LLMs at the messy parts where a human used to spend three days a week.

This article is about where that actually wins, where it fails, and the architectural pattern that decides which.

The wins are concrete

01Monte Carlo risk analysis driven directly off the model

Traditional Monte Carlo simulations on a construction programme require someone to (a) gather all the input distributions, (b) keep them current as the design evolves, and (c) maintain the model itself. In practice, simulations are run once at tender stage and never again.

When you connect an agent to the federated Revit model, the supplier feed, and the historical lead-time database, that changes. The agent re-reads the model overnight, refreshes the distributions for every supplier and every activity, runs 10,000 trials against the live critical path, and surfaces:

The simulation kernel is just Python. The AI's contribution is keeping the inputs fresh and reasoning about which assumptions to challenge — "the curtain-wall supplier's lead-time variance has doubled in the last six weeks; should we re-weight the south-façade tail?" This wasn't possible before because nobody had time to maintain the inputs. Now an agent does it nightly.

COMPLETION DATE · 10,000 TRIALSP5084 dP80+6 dP95+14 dWk 10Wk 12 (plan)Wk 14Wk 16SLIP CONTRIBUTION · PER SUPPLIERPermasteel · curtain95dSeverfield · steel56dHolcim · concrete32dDaikin · MEP18dMitsubishi · MEP10dConCast · structural14d— days saved | days added →
Fig. 02Probabilistic completion distribution after 10,000 Monte Carlo trials with P50/P80/P95 markers — and a tornado chart attributing slip risk to specific suppliers.

02Classifying Revit elements — position, properties, supplier data, BOQ

Revit elements arrive with native properties: type, family, dimensions, level, host. They don't arrive with context. Is this beam on the critical span? Is this wall on the south façade? Is this column carrying the atrium roof? We classify every element through a hybrid pipeline:

The output is a single materialised view: every element with its derived classes, its supplier, its cost, and its tie-back to the Bill of Quantities. That last point matters.

An element classified as STR_BEAM · UB 305×165 · L2 · Span ≥ 8 m · Severfield · stock automatically maps to a BOQ line.

The cost engineer stops manually reconciling drawings to the bill. Procurement stops chasing what the structural engineer changed last week. The programme refers to elements, not abstract activities.

REVIT · NATIVEBeam #1024UB 305×165 · L2 · 8.5 m spanGEOMETRY RULESperimeter / interiorfaçade orientationspan-length classgrid intersectionNATIVE PROPERTIEStype · familyfire ratingstructural usagecustom paramsEXTERNAL DATAsupplier · lead timestock · unit pricecertificationssustainabilityLLM REVIEW · ~5%ambiguous casescited reasoningschema-validatedaudit-loggedCLASSIFIER · MATERIALISED VIEWdeterministic + ML + LLM → single BigQuery row, schema-validated, citedBILL OF QUANTITIES · LINE 142STR_BEAM · UB 305×165 · L2 · Span ≥ 8 mSeverfield · stock · £82/m · ties to "Steel frame L2" (critical)
Fig. 03One Revit element, four parallel input lanes — geometric rules, native Revit properties, external supplier data, and LLM review for the ambiguous 5% — converging into a single, schema-validated BOQ row.

03Critical path that reflows on every change

Static MS Project files are obsolete a week after they're saved. When you classify every element, join the supplier data, and tie elements to activities, the critical path becomes a function:

CP = solve(activities × dependencies × supplier_lead_times × float)

Re-run it nightly. Re-run it on every supplier feed update. Re-run it on every design change. The PMO dashboard goes from a quarterly artefact to a live system. When the curtain-wall supplier flags a delay, the affected activities, the days of remaining float, and the costed mitigation options surface in minutes — not the day before despatch.

04Quantity take-off and BOQ generation

Every element classified → every element costed → every change re-priced. The agent doesn't replace the quantity surveyor; it gives them a live, model-grounded BOQ they can validate and adjust. A change order goes from "we'll get back to you in three days" to "here's the priced delta with citations to which elements changed".

Where it fails

These wins don't come from typing prompts into a chatbot. The naive approaches I see most often:

The architecture that keeps it honest

Six principles we apply on every engagement.

01Hybrid stack: rules + ML + LLM

Deterministic SQL and rules narrow the haystack — from 10 million rows to ten thousand. ML scores and ranks — top forty-seven candidates. The LLM reasons over a few dozen items at the end. The deterministic plumbing is cheap, fast, and predictable. The ML layer brings pattern recognition. The LLM brings synthesis. The right tool per task, never a hammer for every nail.

HAYSTACK10.4 M ROWSL1 · DETERMINISTICSQL · rules · validators · scope guards10,432CANDIDATES OUTL2 · ML / STATISTICSXGBoost · embeddings · ranking · classifiers47CANDIDATES OUTL3 · LLM REASONINGRAG · function calling · cited reasoning1CANDIDATES OUTNEEDLE · CITED
Fig. 04The funnel: deterministic filters narrow the haystack of millions to thousands; ML ranks those down to dozens; the LLM reasons over the shortlist and commits a cited answer.

02RAG with citations

The LLM doesn't recall facts from training. It retrieves from your data — your model, your supplier feed, your BOQ, your historical projects — and cites the row, document, or chunk every claim came from. If the data isn't there, the agent says so. There's no fabrication because there's nothing to fabricate from.

YOUR DATA · INDEXEDMODELRevit · L2-FRAME-v18FEEDSupplier feed · BCSABOQBOQ · line 142–168RFIRFI · 2026-04 #7LOGProject log · wk 11LLM · CLAUDE SONNETretrieve → synthesise → citeGROUNDED ANSWERL2 beams (Severfield)slip prob 84% · 5 d addCITED FROMmodel · L2-FRAME-v18feed · BCSA 2026-05log · wk 11 / varianceschema-validated · audit-logged
Fig. 05Retrieval-augmented generation in practice. Every claim the model produces carries a citation back to the indexed source — model file, supplier feed, BOQ line, project log.

03Two-pass cost optimisation

A cheap model triages every incoming request (email, RFI, change order, query) — intent, sentiment, urgency, classification. Only the ~20% that need deep reasoning go to the expensive model. Per-inbound cost: pennies. Accuracy: indistinguishable from single-pass.

04Structured outputs and function calling

JSON schemas enforced at the model boundary. The model returns tool invocations, not prose. Downstream code never parses free-text. Validation runs before commit. Schema-shaped data is composable; prose isn't.

05Hallucination guards

Confidence thresholds on every output. Citation requirement for every claim. Validation against the source data before write-back. Low-confidence outputs route to a human for review — never auto-commit. Drift detection on output distributions surfaces the moment the model starts behaving differently.

06AI as orchestrator, not replacement

The LLM steers the stack. It chooses which deterministic rule to apply, which ML model to consult, which dataset to query, which tool to invoke. It reasons over outputs and decides what to do next. It does not replace the deterministic plumbing underneath. The plumbing does the heavy lifting; the reasoning steers it.

Data governance — the part people skip

A few non-negotiables.

What this means for AEC clients

You don't need to retrain your team to use AI. You don't need to migrate Revit. You don't need to abandon your existing software stack.

What you need is the layer that connects them — the agentic pipeline that reads your model, joins your supplier data, applies your rules, runs your simulations, and surfaces the answers your project director, your QS, and your site supervisor actually want, on demand.

When it's built properly — with the hybrid architecture, the citations, the guardrails, the governance — it works. Quietly. Every night. On every change. Reducing the amount of repetitive, error-prone work your senior people are still doing by hand.

When it's built badly, it hallucinates a beam in the wrong place and someone catches it in a meeting. Both outcomes are happening in the industry right now. The difference is the architecture.

Leveret Systems

We build production AI agents and CAD-driven custom software for AEC and manufacturing. Talk to us if you have a Revit model, a supplier feed, and a programme that could be working harder for you.

Start a conversation →