Most enterprise AI projects fail at the same place: the gap between a working model and a running operation. The model is built. The demo is impressive. Then the project enters a months-long phase that everyone calls "deployment" and nobody can describe.
ARMOR exists to make that phase boring.
ARMOR is NSigma's five-phase methodology for taking an enterprise workflow from manual to live agent fleet in 90 days. It is the same methodology that runs underneath every NSigma engagement across commercial real estate, utilities, waste management, and investment management. This post explains what happens in each phase, what the deliverables are, and what mistakes the methodology is designed to prevent.
The shape of the methodology
Five letters. Five phases. Each phase has a fixed duration, a defined output, and a hand-off to the next.
| Letter | Phase | Duration | Output |
|---|---|---|---|
| A | Audit | Weeks 1–2 | Workflow map, data inventory, scoped pilot definition |
| R | Refine | Weeks 3–4 | Agent topology, escalation matrix, observability plan |
| M | Mobilize | Weeks 5–8 | Built, integrated, pre-production fleet |
| O | Operate | Weeks 9–12 | Live fleet in production with managed-service operation |
| R | Reinforce | Ongoing | Continuous improvement, expansion, retraining |
The 90-day window is the time from kickoff to live operations. Reinforce continues indefinitely.
What the shape gets right that most "AI consulting" frameworks get wrong: the integration and operations work happens before the model gets built, not after. By the time the agent fleet is touching production data in week 9, the systems it talks to, the people it escalates to, and the metrics it reports against have already been defined.
A — Audit (weeks 1–2)
The Audit phase answers four questions before any code is written:
- Which workflow is the right one to start with? Not the most exciting one. The one with the right combination of volume, structure, recoverability, and business sponsor.
- What data exists, and where? Not "what data is available in theory." What is actually accessible, in what format, at what frequency, with what quality.
- What decisions in this workflow should never be automated? Regulatory filings, safety overrides, financial commitments above defined thresholds, customer-facing exceptions. These become the human-checkpoint list before the architecture is designed.
- Who owns the workflow, the data, and the budget? If you cannot name those three people in a single sentence, the engagement isn't ready to start.
The output is a written audit document — typically 15–25 pages — that includes the workflow map, the data inventory, the decision-rights matrix, and a signed-off pilot scope. Nothing in Refine or Mobilize is allowed to proceed without it.
The mistake this prevents: Most AI projects start by building. Audit forces a two-week pause that surfaces every assumption before money is committed to engineering. Roughly one in five Audits ends with the recommendation don't proceed with this workflow — and that is a successful outcome, not a failure.
R — Refine (weeks 3–4)
Refine designs the system that Mobilize will build.
The deliverables:
- Agent topology — which agents exist, what each one does, how they communicate, what shared state lives where
- Escalation matrix — for every decision class in the workflow, the exact condition that triggers a human checkpoint, the approver, the evidence package
- Observability plan — what logs are captured, where they are stored, how long they are retained, who has read access, what the dashboard looks like
- Integration map — every system the fleet reads from and writes to, with protocols, authentication, rate limits, and failure behavior
- Success metrics — what counts as the pilot working, defined in language that ops, engineering, and finance all agree to
Refine is where the team commits to specifics. Vagueness gets expensive in Mobilize.
The mistake this prevents: Skipping straight from "we'll use AI for predictive maintenance" to writing code. Without an escalation matrix, every edge case becomes a real-time judgment call. Without an observability plan, no one knows whether the system is working after it goes live. Without a defined integration map, the agent ships in a sandbox and cannot connect to anything real.
M — Mobilize (weeks 5–8)
Four weeks of focused build. By the time Mobilize starts, every architectural question has been resolved; the work is execution.
Inside Mobilize:
- Week 5 — environment setup, credentials, integration plumbing
- Week 6 — agent build-out using the 43-agent library plus any custom components
- Week 7 — end-to-end integration testing in staging
- Week 8 — pre-production rehearsal with real data, dry-run escalations, observability validation
The 43-agent library is what makes Mobilize fit into four weeks. Most agent fleets are 60–80% composition of pre-built components and 20–40% custom work for the client's specific systems. Starting from a blank repo would make the timeline twice as long.
Mobilize ends with a go/no-go gate: a written readiness checklist signed by NSigma, the client's engineering lead, the operations sponsor, and (where relevant) compliance. Anything failing the checklist gets fixed before Operate begins.
The mistake this prevents: Treating "build" as the whole project. Mobilize is one of five phases, not the entire engagement. The teams that try to compress the methodology by cutting Audit and Refine end up spending Mobilize negotiating the things that should have been settled in week 2.
O — Operate (weeks 9–12)
The fleet goes live in production. NSigma operates it under a managed-service agreement.
What "operate" means concretely:
- 24/7 monitoring of the agent fleet, with on-call escalation paths for genuine system failures
- Daily review of escalations — what was raised, who approved or denied, how long it took, what evidence accompanied it
- Weekly review of metrics against the success criteria defined in Refine — is the fleet hitting the targets, where is it underperforming, what needs adjustment
- Continuous handling of edge cases that surface in production — these feed directly into Reinforce
The client retains full visibility. Every decision the fleet makes is logged. Every escalation is reviewable. The operations sponsor has dashboard access from day one of Operate.
The reason Operate exists as a distinct phase (rather than being lumped into "deployment") is that the first weeks of live operation are when the model meets reality. The team handling that meeting is not the team that built the system in Mobilize — it is an operating team trained on running the fleet, with explicit responsibility for surfacing what needs to change.
The mistake this prevents: Treating live deployment as the end of the project. The first four weeks of operation are the most valuable diagnostic period of the entire engagement; treating them as routine maintenance wastes the data.
R — Reinforce (ongoing)
Reinforce is the part of the methodology that runs forever.
Three workstreams:
- Model retraining and drift management. Production data accumulates. Distributions shift. Models that performed at 96% precision in month one drift to 91% by month six if no one is watching. Reinforce captures the drift signal, retrains on a defined cadence, and validates each new model against historical edge cases before promotion.
- Edge-case backlog. Every escalation that didn't need to be one — and every routine decision that should have been escalated — feeds into a backlog. The backlog drives architectural updates, prompt revisions, and new agents.
- Expansion. Once a fleet is running reliably on one workflow, the adjacent workflows that share data, systems, or operators become candidate expansions. The methodology re-enters Audit at a reduced scope (typically 1 week rather than 2) for each new workflow.
Reinforce is what separates a deployed fleet from a compounding capability. A fleet without reinforcement decays. A fleet with reinforcement expands.
The mistake this prevents: Declaring victory at week 12. The companies that treat operation as a steady state and skip the reinforcement workstream watch their fleets degrade over six to twelve months. The ones that fund it see expanding ROI year over year.
What ARMOR is not
ARMOR is not a stage-gate process for slideware. There are no "phase exit reviews" that produce 80-slide decks. The deliverables are working artifacts: integration maps, escalation matrices, signed checklists, running systems.
ARMOR is not a sequence of meetings. Each phase has tight inputs and outputs; meetings are kept to the minimum required to surface decisions and assign owners.
ARMOR is not negotiable on the order. Skipping Audit to "save time" is the single most common reason agent fleet projects fail. Skipping Refine produces fragile architectures. Skipping Reinforce kills the ROI curve. Each phase exists because removing it has been observed to break the result.
ARMOR is not a substitute for cross-functional sponsorship. The methodology coordinates engineering, operations, risk, and the executive sponsor; it cannot create that sponsorship if it doesn't exist. The first question of Audit — who owns this workflow — is usually the hardest one in the engagement.
A worked example — predictive maintenance for a CRE portfolio
To make the phases concrete, here is what a typical 90-day deployment looks like for a commercial real estate portfolio adopting predictive maintenance via ReMI.
Audit (weeks 1–2):
- Data inventory: BMS coverage across 47 buildings, telemetry frequency, gaps
- Workflow map: who currently handles maintenance dispatch, average time-to-response, exception classes
- Decision-rights matrix: what value of capital expenditure requires human approval, who approves
- Scoped pilot: 12 representative buildings, three equipment classes (chillers, AHUs, elevators)
Refine (weeks 3–4):
- Agent topology: telemetry ingestion → anomaly detection → triage → dispatch → confirmation → learning
- Escalation: any predicted failure with capital implications > $25K escalates to the asset manager
- Observability: per-asset dashboard, daily anomaly summary, weekly precision/recall scorecard
- Integration: BACnet read from BMS, write to CMMS via REST API
Mobilize (weeks 5–8):
- BMS read pipeline live by week 5
- Anomaly detection models trained on 18 months of historical data, validated against known failure events
- CMMS integration tested in staging
- End-to-end rehearsal week 8 with the operations team present
Operate (weeks 9–12):
- Go-live week 9 on 12 buildings
- First escalations within the first 48 hours
- Daily review meeting for the first two weeks, then weekly
- Week 12: report on 8 successfully predicted failures, $340K in avoided emergency calls, 22% reduction in reactive work orders
Reinforce (ongoing):
- Expand to remaining 35 buildings over months 4–6
- Add boiler and electrical equipment classes in month 5
- Retrain anomaly models quarterly, with held-out test set
- Backlog driven by escalations that didn't need to be raised and routine cases that should have been
Where to go next
If you want to evaluate ARMOR against your own operation, three places to start:
- The Agent Fleet Engineering page — full methodology overview plus the 43-agent library
- The glossary — concise definitions for every term used here
- What Is Agent Fleet Engineering? — the foundational definition this post builds on
- 90-Day AI Pilot vs Traditional POC — why the 90-day window beats the 6-month POC
The fastest concrete step is an ARMOR Audit — two weeks, fixed fee, written deliverable. If the workflow turns out not to be a good fit, you get a written diagnosis. If it is, you have the artifact you need to fund Refine and the rest.