Skip to content

Fault Intelligence as Code

AI Agents, RAG, and MCP for Network Ops - DEVNET-3171, Cisco Live US 2026

This project demonstrates how network fault knowledge can be captured as version-controlled artifacts and executed by an agentic troubleshooting system. Splunk detects a known fault, the webhook relay opens an OpenCode session, the network-troubleshooter agent retrieves the right intelligence artifacts and KB context, and RADKit MCP provides controlled access to live network devices.

This is a Cisco Live demo/reference implementation. Run the local quickstart first, then adapt the endpoints, credentials, and device values for your own lab before connecting the workflow to live infrastructure.

The demo is intentionally small enough to understand on stage, but the architecture mirrors the larger operating model: detection logic, repair workflows, human approval, and operational context are all explicit and reviewable.

What This Demonstrates

This repository shows a path from fault intelligence as code to fault response as an auditable agentic workflow. The important operating model is simple:

  • Fault Signatures, Repair Action Workflows, and Remediation Guides provide reviewed technical ground truth.
  • The Markdown KB adds business rules, maintenance windows, known issues, and escalation policy.
  • OpenCode agents and skills execute repeatable procedures against scoped context.
  • MCP connects the harness to live systems such as RADKit.
  • Human approval gates service-impacting actions.

The point is not to give the agent every document and every possible tool. The point is to give it the right context, in the right structure, at the right time.

Documentation Approach

These docs are OpenCode-first. Install OpenCode, fork and clone the repository, open OpenCode in the repository folder, and use agent prompts as the primary workflow. Manual commands are included as backup or reference for people who want to see exactly what the agent is doing.

Use Builder agent from OpenCode for general setup and customization, network-troubleshooter for fault runs and RAW test bundles, ia-curator for fault-intelligence artifacts, and kb-curator for knowledge-base maintenance.

Start by Job

Job Start here
Run the local demo Local Agent Test
Connect a lab Lab Environment
Create new fault intelligence Artifact Authoring
Maintain operational context KB Curator
Understand the system Architecture Overview
Troubleshoot a failed run Troubleshooting

Current Demo Path

The current simulator default is AD000002: BGP Neighbor Administrative Shutdown on IOS XR.

Item Value
Default device xr-43
Default neighbor 172.20.20.18
Default VRF default
Default neighbor AS 3334
Artifact group intelligence-artifacts/AD000002-bgp-neighbor-admin-shutdown-xr/

The RAW confirms BGP state = Idle (Admin), verifies shutdown exists in the neighbor configuration, requests approval, applies no shutdown, and verifies that the BGP session returns to Established.

Key Runtime Boundaries

network-troubleshooter can diagnose live faults, call RADKit MCP tools, and invoke read-only sub-agents. It cannot edit repository files and cannot call curator agents. ia-curator and kb-curator are separate author-time agents for maintaining artifacts and the KB vault.

Next: start with the Quickstart, or read the Architecture Overview for the system map.