Strategic Overview
By May 2026, every logistics CIO I spoke with had the same paradox: operations teams wanted autonomous agents on ERP, warehouse, and ticketing systems, while security teams refused any architecture that punched inbound holes through the perimeter. The Global Logistics Enterprise in this case study—12 countries, roughly 8,000 staff, SAP at the core plus a custom warehouse platform—was stuck in that gap for eighteen months. They had piloted chatbots that could describe a delayed shipment but could not resolve one, because no production path existed for an external model host to call internal tools safely.
What actually changed the program was treating enterprise MCP integration not as “expose APIs to Claude,” but as private MCP tunnel architecture: outbound-only connectivity from on-prem MCP servers through a governed gateway to an orchestrator, with human-in-the-loop (HITL) gates on write operations. After the Linux Foundation Agentic AI Foundation (AAIF) neutralized MCP as an open standard—and Anthropic’s MCP tunnels made outbound SSE viable for regulated environments—the platform team had both vendor momentum and a security narrative auditors would accept.
Over a six-month program (90-day pilot plus phased hub rollout), we connected 14 internal systems across ERP, WMS, transport management, and ITSM without publishing a single public MCP endpoint. Order exception resolution time fell from 4.2 hours to 38 minutes (median, measured on 2,400 exceptions in the pilot lane). Manual copy-paste hours in dispatch and customer operations dropped from ~1,200/month to ~310/month. Security’s headline metric: zero critical policy violations in the pilot window, with every tool call attributed to an agent identity, a human supervisor, and an immutable audit record.
This case study is written for CTOs, platform leads, and transformation directors who must ship agentic automation on legacy logistics stacks—not for teams still debating whether MCP is “real.” It pairs technical depth (mesh topology, tunnel semantics, identity) with program depth (RACI, phased rollout, union of ticket categories) so you can defend both the architecture review and the steering committee deck.
Client Context & The Integration Deadlock
The client—referred here as Global Logistics Enterprise—runs hub-and-spoke distribution across North America, Western Europe, and Southeast Asia. Their operational backbone is SAP S/4HANA for order and finance, a custom WMS for high-velocity fulfillment lanes, a transport management system (TMS) for carrier events, and ServiceNow-class ITSM for internal and B2B exception tickets. Dispatch supervisors, inventory controllers, and customer operations analysts collectively touched six to eleven screens to close a single “stuck shipment” exception: ERP status, WMS pick confirmation, TMS event log, ticketing notes, email threads, and sometimes a spreadsheet sidecar.
In 2024 the enterprise stood up an “AI innovation” workstream. Early copilots helped analysts draft emails. They did not reduce cycle time on exceptions because no governed tool execution existed. Integrations proposed in 2025 followed a familiar pattern: SaaS agent host + inbound webhook into the DMZ + brittle service account with read/write on production tables. InfoSec rejected that pattern twice—correctly—citing lateral movement, non-repudiation gaps, and the impossibility of per-tool revocation at agent speed.
Citation anchor: In regulated logistics environments, the blocker is rarely model quality—it is provable containment. Auditors ask whether an agent can exfiltrate bulk shipment data, modify tariffs, or open tickets under another user’s identity. Without outbound-only tunnels, per-tool schemas, and HITL on writes, the answer is “we don’t know,” and the program stops in architecture review.
The breakthrough constraint we adopted: agents may plan and read broadly, but may only write through approved tools, under supervisor approval, with session-scoped credentials. That constraint maps cleanly onto MCP’s tool manifest model and onto the client’s existing OIDC workforce identity—details in the architecture sections below.

Why Inbound MCP Exposure Failed Security Review
Before we designed the mesh, the client’s SI proposed inbound MCP servers in the DMZ: public HTTPS endpoints that forwarded tool calls to internal APIs. Security’s red-team exercise in March 2026 surfaced three show-stoppers that are now textbook for AI agent ERP integration programs.
First, perimeter expansion. Every inbound route is another TLS termination point, WAF rule set, and certificate lifecycle. MCP tool hosts are not static microservices; they evolve weekly as operations adds tools (e.g., “rebook carrier slot,” “release credit hold”). Security could not tie each change to a formal penetration retest cadence.
Second, confused deputy risk. A shared gateway API key let any compromised agent session invoke any registered tool. Without binding agent identity → tool allow-list → prompt hash at the gateway—as we later implemented in line with our agentic zero-trust case study—an indirect prompt injection in a ticket description could escalate to a funds release or inventory adjustment.
Third, observability theater. Logs showed “gateway called WMS API” but not which agent plan step, which human approved the write, or which policy version applied. That gap fails SOC 2-style traceability and makes EU AI Act–style deployer accountability uncomfortable for legal.
We documented these findings in a one-page decision memo comparing inbound MCP exposure vs outbound MCP tunnel architecture. The table below is the summary the CIO used to unblock budget.
| Dimension | Inbound MCP (Rejected) | Outbound Private MCP Tunnel (Selected) |
|---|---|---|
| Firewall posture | New inbound allow rules per environment | Egress-only from MCP zone; no inbound agent paths |
| Blast radius | DMZ compromise may pivot to internal APIs | Tunnel broker enforces tool schema + OIDC per call |
| Audit attribution | Shared service account in logs | Per-session agent JWT + supervisor HITL record |
| Operational drift | Public DNS + cert churn per tool | On-prem MCP manifests versioned in Git |
| Time to security sign-off | Est. 9–14 weeks (re-test each tool) | Pilot approved in 4 weeks with scoped tools |
For background on why outbound tunnels entered the enterprise mainstream in 2026, see our analysis of Anthropic MCP tunnels and self-hosted sandboxes and the AAIF / Linux Foundation MCP donation—both shaped the client’s vendor shortlist and auditor talking points.
Target Architecture: Private MCP Mesh
The private MCP mesh has four planes that must not collapse into one “magic gateway”:
- Orchestration plane — plans multi-step exception workflows, selects tools, enforces budgets and timeouts.
- Tunnel plane — maintains outbound SSE (or mutually authenticated long poll) from the enterprise MCP broker to the orchestrator; no inbound initiation.
- Tool plane — on-prem MCP servers wrapping SAP OData, WMS REST, TMS events, ITSM tickets; each server publishes JSON schemas and read/write classification.
- Governance plane — OIDC for humans, machine identities for agents, immutable audit store, HITL console for write approval.

The orchestrator runs in the client’s preferred hybrid zone (Azure in this engagement). It never holds SAP passwords. Instead, the MCP broker inside the logistics VPC terminates tunnels and forwards tool invocations to local MCP servers bound to localhost or private RFC1918 addresses. Tool calls that mutate state—credit release, inventory adjustment, ticket closure—route to the HITL console where a dispatch supervisor sees the proposed JSON payload diff before commit.

Citation anchor: The Agentic AI Foundation’s consolidation of MCP under neutral governance matters operationally: the client could standardize tool manifests once and reuse them across orchestrator vendors. That reduced vendor lock-in fear—the platform lead’s #1 political risk—and accelerated procurement.
We aligned manifest design with our Enterprise MCP & Private Agent Integration solution patterns (catalog governance, schema versioning, outbound-only broker)—adapted here for logistics-specific tools and union-negotiated change windows.
Exception Workflow: From Stuck Shipment to Resolved Ticket
The pilot’s killer use case was shipment exception resolution: customer-visible delay, internal root cause unclear, six systems involved. Previously median 4.2 hours from ticket open to resolved state; target under 45 minutes without sacrificing control.

Trigger: TMS publishes SHIPMENT_DELAYED to Kafka with order ID, lane, and reason code.
Plan: Orchestrator agent decomposes into: (1) fetch ERP order holds, (2) fetch WMS pick status, (3) search ITSM for open incidents, (4) propose carrier rebook or customer comms draft.
Read tools: erp.get_order, wms.get_pick_wave, itsm.search_tickets — auto-approved, sub-second budgets.
Write tools: erp.release_credit_hold, wms.reallocate_inventory, itsm.update_ticket — HITL required.
Outcome: Ticket moves to resolved with linked audit IDs; supervisor sees before/after field diff.

The sequence diagram became the training slide for dispatch supervisors. It made concrete what “agent autonomy” meant in their shift: the agent does the lookup legwork; humans still own commits.
Discovery: Mapping the Swivel-Chair Tax
Before writing a line of broker code, we ran a two-week shadow study in Rotterdam and Chicago hubs. Analysts wore session loggers (with union agreement) that counted application switches per exception. The average stuck shipment touched 8.4 applications and 23 manual copy-paste events. Not all of that is automatable—some conversations require carrier relationships—but 61% of keystrokes were pure data movement between systems the client already owned.
That finding reframed the business case. Leadership had been quoted $4.8M for a “logistics AI platform” SaaS bundle. We showed that $1.1M in year-one integration and governance—scoped to MCP mesh + HITL—could address the majority of swivel-chair cost without transferring shipment rows to a third-party multi-tenant datastore. FinOps still modeled orchestrator token spend (~$38K/month at pilot scale); even with that line item, the program cleared their 18-month payback hurdle on operational savings alone (scenario, not audited).
We also cataloged integration anti-patterns to kill explicitly:
- Screen-scraping RPA on SAP GUI—brittle, non-auditable, hated by basis teams.
- Monolithic “integration agent” with one super-user SAP account—fast demo, catastrophic review.
- Email-as-API—operations forwarding CSV extracts; GDPR and customer contract risk.
The private MCP mesh was the first option that scored high on automation and high on containment in the same matrix.
Implementation Blueprint (Technical)
Step 1: MCP Server Wrappers per System
Each MCP server is a thin adapter: validate input schema, map to internal API, redact PII from responses, attach correlation ID. Example pattern for read-only ERP order fetch:
from mcp.server import Server
from pydantic import BaseModel, Field
server = Server("erp-readonly")
class GetOrderInput(BaseModel):
order_id: str = Field(..., pattern=r"^[A-Z0-9]{8,14}$")
@server.tool("erp.get_order", read_only=True)
async def get_order(inp: GetOrderInput, ctx) -> dict:
# ctx carries agent_jwt, correlation_id — enforced by broker middleware
row = await sap_odata.fetch_order(inp.order_id, fields=ALLOWED_FIELDS)
return redact_pii(row)
Write tools register read_only=False and are rejected at the broker unless hitl_token is present in context—issued only after supervisor approval in the console.
Step 2: Outbound Tunnel Broker
The broker maintains registered SSE streams initiated from inside the VPC. Pseudocode for connection policy:
// Broker validates orchestrator client cert + registers tunnel scope
func RegisterTunnel(orgID string, allowedTools []string) (tunnelID string, err error) {
if !security.OutboundOnlyNetworkPolicy() {
return "", errors.New("inbound listener disabled by policy")
}
return tunnels.Create(orgID, allowedTools, maxTTL: 15*time.Minute)
}
This mirrors the connectivity model described in our MCP tunnels news analysis—adapted for multi-hub logistics.
Step 3: Identity & Audit
Humans authenticate via OIDC (existing workforce IdP). Each agent run receives a machine identity JWT with: agent_run_id, supervisor_pool, tool_allow_list, prompt_hash. Audit events append to an immutable store (WORM bucket + SIEM forward). Security’s 90-day pilot exit criterion was zero critical violations—defined as write without HITL, tool outside allow-list, or export of >500 shipment records in one call. We hit zero.

Program Management: Phased Rollout & RACI
Technology was half the battle. The other half was change management across dispatch unions, regional hub managers, and a security organization that had been burned by RPA projects with opaque bots.
Phase 0 (4 weeks): Architecture review, tool manifest baseline, red-team on tunnel broker.
Phase 1 (90 days): Pilot lane—Rotterdam + Chicago hubs, 3 read tools + 2 write tools, 40 supervisors trained.
Phase 2 (8 weeks): Second wave—Singapore + Dallas, +6 MCP servers (TMS deep integration, returns portal).
Phase 3: Global catalog governance board, monthly schema semver, integration with FinOps showback per tool call.
| Role | Responsible (R) | Accountable (A) | Consulted (C) | Informed (I) |
|---|---|---|---|---|
| Tool manifests & MCP servers | Platform engineering | VP Platform | Security architecture | Hub ops leads |
| HITL policy & supervisor training | Operations excellence | COO logistics | Union delegates | Customer ops |
| Tunnel broker & network egress | Cloud networking | CISO | Vendor management | Internal audit |
| Agent orchestration & prompts | AI product team | CTO | Legal/privacy | Finance (FinOps) |
We unionized ticketing categories before go-live: agents could auto-close only Category A (informational delays); Category B/C (financial or SLA-impacting) always required HITL. That single policy cut union pushback more than any UI polish.

Technology Stack (Reference)
| Layer | Component | Purpose |
|---|---|---|
| Integration | MCP servers (ERP, WMS, TMS, ITSM) | Schema-first tool surface to agents |
| Connectivity | Outbound tunnel broker (SSE) | Egress-only orchestrator link |
| Messaging | Kafka (existing) | Exception triggers & async fan-out |
| Identity | OIDC workforce IdP + agent JWT issuer | Human + machine attribution |
| Governance | HITL console + policy engine (OPA) | Write approval & tool allow-lists |
| Audit | Immutable log store + SIEM | 90-day pilot compliance evidence |
| Orchestration | Multi-agent state graph (vendor-neutral) | Plan/decompose exception workflows |

Measured Outcomes & Before/After

| Metric | Before | After (Pilot + Wave 2) | Measurement Notes |
|---|---|---|---|
| Median exception resolution | 4.2 hours | 38 minutes | n=2,400 pilot exceptions; P50 |
| Manual data-entry hours (ops) | ~1,200 / month | ~310 / month | Self-reported + keystroke proxy sample |
| Critical policy violations | Not instrumented | 0 (90-day pilot) | Defined: write w/o HITL, allow-list breach, bulk exfil |
| Systems connected via MCP | 0 production | 14 | Phased catalog through Wave 2 |
| Inbound firewall rules for agents | Proposed: 12+ | 0 | Outbound-only tunnel policy |
Financially, the client modeled $2.1M annualized productivity and penalty avoidance from faster exception closure—scenario-based, not audited savings. I label it scenario because logistics margins swing with fuel and lane rates; the operational metrics above are what engineering and ops signed.

Citation anchor: Zero critical violations in 90 days did not mean zero denials—the broker denied ~3.8% of tool calls (mostly scope and rate limits). That distinction mattered to auditors: active enforcement, not luck.

Human-in-the-Loop Console & Supervisor Experience
If supervisors hate the UI, HITL becomes rubber-stamp. We co-designed three interactions with Rotterdam dispatch leads:
- Diff-first approval — write proposals show field-level before/after, not raw JSON.
- Explain plan — collapsible agent reasoning chain with links to source ticket IDs.
- One-click reject + teach — rejection codes feed prompt/policy tuning; no shame language in UI copy.


Training was role-based: 90-minute workshop for supervisors, 30-minute e-learning for analysts, separate security office hours for auditors who wanted to see immutable logs.
Wave 2 Expansion: TMS Depth, Returns, and Catalog Governance
After the pilot, the steering committee approved Wave 2 only because audit exports were boring—in a good way. Supervisors were using HITL daily; denials were explainable; tunnel uptime stayed above target. Wave 2 added:
- TMS event-sourced tools — subscribe to delay, customs hold, and customs-release events without polling.
- Returns portal MCP — read RMA status, propose label reissue (HITL).
- Credit & billing read-only — separate server so agents could explain invoice lines without touching AR write APIs.
Catalog governance became a monthly ritual: platform engineering, security, and one rotating hub ops lead. Agenda fixed—(1) new tool proposals, (2) semver bumps, (3) retirement of unused tools, (4) review of denial spikes. Proposals required: JSON schema, owner team, read/write class, data classification, rollback plan. Tools without rollback plans did not ship—full stop.
We reused patterns from the client’s earlier self-healing supply chain event mesh for Kafka triggers, but rejected merging those autonomous write paths with agent writes. Self-healing flows kept their own service accounts and circuit breakers; agents stayed in the MCP governance lane to avoid a blended blast radius.
Failure Modes We Planned For (And Hit Once)
Tunnel partition (Week 11): A misconfigured egress proxy dropped SSE keepalives for 22 minutes. Orchestrator queued plans; supervisors saw a yellow banner in the HITL console; no write proposed during partition was committed post-recovery because HITL tokens expired. Post-incident: redundant broker paths + automatic fail-over drill quarterly.
Schema drift (Week 14): WMS team shipped a field rename without semver bump. MCP server returned validation errors; broker denied calls; alerts fired. Fix: manifest CI gate blocking deploy without version increment—same discipline as public REST APIs.
Over-eager read scope: An agent loop requested paginated inventory for an entire region—within policy but expensive. Rate limiter throttled; ops tuned max_rows per tool. Teaches that read tools need FinOps guardrails too, not just writes.
Compliance, Data Residency, and Union Alignment
Legal cared about deployer accountability under emerging EU AI Act framing—even though this client’s pilot lanes were primarily US/EU mix. We documented: human oversight on consequential writes, logging, incident playbooks, and literacy training for supervisors (what agents can/cannot do). The HITL console’s diff view became the exhibit auditors photographed.
Union engagement focused on category rules, not technology mystique. Agents would not discipline workers, score individuals, or auto-close grievance-class tickets. We published a one-page “automation boundary” memo co-signed by operations and union reps—cheap politically, invaluable at go-live.
Data residency: tool responses with EU customer PII stayed in EU VPC MCP servers; orchestrator calls used region-sticky routing. Redaction removed direct identifiers before cloud reasoning on cross-border lanes.
Lessons for Platform & Engineering Leaders
Do not conflate copilot with agent. Drafting emails saved minutes; closing exceptions required tools. Until MCP (or equivalent) is production-governed, stay honest about ROI.
Manifests are contracts. Treat breaking schema changes like API semver; the client’s monthly governance board prevented the “shadow tool” sprawl that kills integration programs.
Outbound-only is a story security can retell. It aligned with 2026 tunnel narratives and shrank pen-test scope—faster than arguing inbound WAF rules per tool.
Pair with zero-trust agent identity. Mesh connectivity without agentic zero-trust identity patterns would have left confused-deputy risk on the table.

Performance, SLOs, and Cost Controls
Platform teams asked fair questions: Will agents DDOS our SAP? We answered with SLOs and budgets, not hope.
| SLO | Target | Pilot Actual |
|---|---|---|
| Broker add latency (P95) | < 120 ms | 94 ms |
| HITL queue time (P50) | < 3 min supervisor wait | 2.1 min |
| Tunnel availability | 99.9% | 99.95% |
| Tool denial rate (policy) | < 5% | 3.8% |
Each tool carried per-minute call caps and row limits. erp.get_order could not fan out into list endpoints; wms.get_pick_wave was scoped to a single wave ID. Orchestrator plans had token and step budgets—runaway loops terminated with a supervisor alert, not a silent SAP job storm.
Token spend was tracked per hub and charged back via FinOps tags (cost_center, lane, agent_workflow). That transparency prevented the “AI black hole” narrative finance teams fear after copilot rollouts.
Observability: Distributed traces linked agent_run_id from orchestrator → broker → MCP server → internal API correlation ID. When a supervisor asked “why did the agent suggest a credit release?”, we could replay the tool outputs—not the raw model weights—in under two minutes. That replay capability was the difference between trust and folklore.
Testing strategy: We ran three layers before production pilot—(1) contract tests on every MCP schema against recorded SAP/WMS fixtures, (2) red-team prompts attempting tool escalation and bulk export, (3) shadow mode where agents proposed writes but HITL auto-rejected into a log for two weeks. Shadow mode surfaced two false-positive credit-hold suggestions; we tightened prompts and added a business-rule check on order value thresholds before any human saw a diff.
What we would do differently: Start catalog governance in Week 1, not Week 10. The first wave of “quick tools” created semver debt. Second lesson: embed a hub ops lead in platform standups daily during pilot—not weekly—because WMS field renames do not respect sprint boundaries.
Frequently Asked Questions
Can we run private MCP without sending shipment data to a public model API?
Yes—hybrid designs keep tool execution and PII-heavy reads on-prem while only sending redacted plan summaries or truncated context to the orchestrator. The client redacted consignee names and full addresses from tool responses before any cloud reasoning step.
How is this different from traditional RPA bots?
RPA often uses brittle UI scripting and shared credentials. MCP tools are schema-defined APIs with per-call identity, policy denial logs, and HITL on writes—closer to microservice governance than screen-scraping bots.
What breaks if the outbound tunnel drops?
The broker fails secure: agents can finish read-only steps already in flight but cannot open new write paths. Supervisors fall back to manual screens; alerts page platform on-call. Active-active broker nodes kept pilot availability at 99.95%.
Does SAP licensing block MCP wrappers?
Licensing is a customer-specific legal question. Technically, OData/BAPI wrappers are standard integration paths. The client used existing PI/PO entitlements where required and documented interfaces in their integration catalog.
How long from architecture review to pilot go-live?
Four weeks to security sign-off on scoped tools; ninety days pilot execution including training. Wave 2 added eight weeks for four more MCP servers—parallel workstreams, not serial discovery.
Similar Mandate? Next Steps
If your organization is stuck between agent ambition and firewall reality, you do not need another copilot slide deck. You need a scoped mesh: outbound tunnels, manifests, HITL, and audit evidence that satisfies security on day one.
Need this on your stack? I help leadership teams design and ship governed agent integration on ERP, WMS, and ticketing cores—book a discovery call (30 minutes, no pitch if we are not a fit).
Want the engagement model? See Services and Process for how we run architecture review → pilot lane → scale.