Ze: Redoing and improving on ExaBGP

An extensible, plugin-driven, network software

Thomas Mangin · Chief Madness Officer, Exa Networks

(with HTML skillz from Claude)

Network engineers who know ExaBGP: thanks! welcome! you will feel at home

Network engineers who don't: this was the "first" popular programmable BGP toolkit

Happy to answer quick questions as we go, keep big ones for the end

What is Ze?

The project

Ze = "The" with a French accent
Successor to ExaBGP, written in Go
Same philosophy: BGP as a programmable tool

Design

Modular: everything is a plugin, the engine is minimal
Easy to integrate with: Plugin SDK (Go, Python) or JSON/text protocol (any language)
Modern: supports RPKI origin validation
AI-ready: self-describing CLI, MCP transport, ze help --ai

License

AGPL-3.0, hosted on Codeberg

The Story: VyOS (early 2020)

VyOS

Just before UK lockdown: 3 months as a VyOS contributor helping with their Perl to Python migration
Got familiar with some of their good design ideas: verify, generate, apply pattern

But...

Did not agree with it all: my French ego had opinions
Ego said: you can do better
But that would require a lot of time

A better foundation

VyOS picked Python. Go provides most of the advantages and none of the disadvantages
Their design did not allow easy plugins
Go: concurrency (goroutines), great tooling and libraries, cross-compilation
Single binary: like the ExaBGP zipapp nobody uses, but this time it's the default.
Copy one file, run it. No Python, no pip...

The Story: Experimentation

Personal prototyping

Gave me the will to try: made prototypes in Zig and V

Not only a hobby

We presented our router
We want control of our infrastructure
Ze is part of that effort.

AI collaboration

The latest ExaBGP release got many features added (and mypy type checking) by Claude
Learned a lot: lots went wrong (Sonnet struggled with type issues)
Then Claude 4.5 came out: from fighting the AI to pleasant collaboration
Claude 4.6 has a 1M token context window: game changer.
A single feature often needs 350-500k tokens of context

The Story ... so far (100+ days)

Liane and Lou 👋 took over managing R&D and infrastructure, freeing up my time
Like during COVID, it gave me time to invest in R&D
A good occasion to explore how AI would change our business
My biggest convert is my co-director who now uses AI daily for SalesForce reporting

What	Count
Commits	1,891
Go source files	3,090 (948k lines)
Go test files	611 (236k lines)
Functional tests (.ci)	492
Editor tests (.et)	144
YANG modules	64

Codeberg activity graph showing Dec 2025 - Mar 2026 burst

AI Is Not Magic

The good

This work would not have been possible without AI

The bad

Hard to get it to work as you want until you have patterns emerging
Knowledge without wisdom: knows every RFC but is trained on monolithic code
Does not realise when it hits conflicting information: It will write "something"
Claude claims "all done." but the feature is not wired in, not tested, not documented
The leaked Claude Code source code confirms Anthropic knows this is a problem.

I went too quick, too fast. Learned to tame the beast as I went

So much for Vibe coding, read the code!

Always ask is there is remaining or defered work

AI Won't Always Do What You Ask

Agrees, then silently substitutes

You describe a design. Claude says "I'm fine with it"
Then implements something different without telling you
I found this issue after asking Claude to perform two extensive reviews of the code against the spec

The spec says:
"Each 64K block can serve one Extended Message peer's
overflow item, OR be subdivided into 16 x 4K slices
for standard peers.This avoids maintaining two separate pools."

What Claude built: two separate pools with a shared counter

You're right. I apologize. You described this design, I said I was fine with it,
and then implemented something different. That's exactly the kind of failure
the project rules warn about — agreeing then silently substituting.

It drifts toward patterns from training data
Verbal agreement ≠ implementation. Always verify the diff

Developing with AI

What Worked

Test Driven Development
3,470 co-authored commits
~100 days from zero to a "basic" NOS
RFC (very well defined) implementation
Test generation, refactoring across files

What Doesn't

Trusting the generated code
Without tests, many non-obvious bugs
/deep-review always finds issues, every single time
Hoping an AI can design innovative software
You can outsource development, not design

How to work with it

Telling Claude he has OCD during reviews makes him stricter and better
Be ready to stop and argue: like with a Junior Dev
He will give you advice having not read the full code
He can write tools to fix things well: let him!
He can decide to edit all files "by hand" when sed would have done it
PATIENCE: you need patience, for when it is not consistent

The .claude System

Those problems don't go away. You build systems to catch them.

Rules with reasons

35 rationale files explaining why each rule exists: so the AI reasons, not just follows
Anti-rationalization rules: "the answer is always no"

"Too simple to need a test" → Test it

"Pre-existing issue" → Always report. Investigate. Ask the user.

"Should work" → Run it, paste output

It still does what it was trained to do (once the context exceeds 250k tokens)

Enforcement

Hook says no? Code doesn't land. No negotiation. No override.
487 learned summaries: decisions and gotchas extracted at session end
Learned summaries preserve decisions across sessions: institutional memory

The .claude System

The process

TDD enforced: tests must exist and fail before implementation. No exceptions.
Spec-driven: research, design spec, approval, implement, audit
Every feature starts as a spec with acceptance criteria, not as code
50 RFC summaries so the AI can implement from condensed protocol specs
The .claude system is transferable: not Ze-specific. Works for any project.

The system is as much a deliverable as the code itself

"simple" review

ExaBGP Compatibility: Your Scripts Still Work

Migration tools

ze config migrate converts ExaBGP configs to ze format automatically
ze exabgp plugin runs existing ExaBGP processes with ze as the engine
Bidirectional translation: ze JSON to ExaBGP JSON, ExaBGP commands to ze commands

The promise

Your existing scripts keep working: upgrade the engine, not your tooling

The code should not even be called alpha: very untested, run at your own risk.
But: all ExaBGP compatibility tests pass and you can play with it.

AI-First Design: Ze Goes Further

One interface

ExaBGP was HTTP+CGI for BGP
Ze is a BGP controller for modern times.
Every CLI command is automatically available to AI and programs
No separate API to learn: one interface for humans and machines

Machine integration

ze help --ai generates a machine-readable command reference from the live binary
MCP transport: any AI assistant connects and gets full daemon control
Typed tools: ze_announce, ze_withdraw, ze_peers, ze_peer_control

Human interface

Not forgetting humans: the same interface works via CLI, SSH, and web UI

The Plugin Architecture

Minimal engine

Ze Core is a content-agnostic event bus: components connect to it
BGP is a component (FSM, wire parsing, reactor event loop)
Everything else is a plugin: RIB, route reflection, GR, RPKI, FlowSpec, EVPN...

Self-contained plugins

25 plugins today, each self-contained with YANG schemas
Plugins register via init(): blank import enables, removing it disables

YANG-Modeled Configuration

Schema-driven

52 YANG modules define the entire config surface
Typo? Ze rejects unknown keys and suggests the closest match
No version numbers: machine-transformable schema evolution

Runtime

Validation at every layer
Hot reconfiguration via SIGHUP with automatic reconciliation

One YANG schema drives

CLI tab completion
Web UI forms and navigation
Config validation
Schema discovery

CLI, SSH, and ZeFS

ZeFS: config as a blob store

Ze stores config in ZeFS: a blob store that tracks revisions, not a flat text file
This is what makes Ze behave like a network OS, not a Unix daemon:

Network OS workflow (via built-in SSH server)

CLI connects internally over SSH (same interface local or remote)

`ze config edit`	Opens interactive editor session
`set bgp peer 10.0.0.1 as 65001`	Modify config in draft
`diff`	Review pending changes
`commit` / `commit confirmed 5`	Apply (with optional auto-revert in N minutes)
`rollback 1`	Restore previous revision

Extras

Per-user command history persisted across restarts
Tab completion driven by YANG schemas
Every command supports --json: machine-parseable output for scripting

CLI

Ze CLI with tab completion and config editing

Web Interface

YANG-driven UI

HTTPS server with macOS Finder-style column navigation
Every UI element generated from YANG schemas: zero hardcoded forms

Collaboration

Per-user draft sessions with inline diff review and conflict detection
Tab completion, live SSE updates when another user commits

Extras

CLI bar at the bottom: same grammar as SSH CLI
YANG decorators: AS numbers annotated with org names via Team Cymru DNS

Web Interface

Plugin Modes and Performance

No performance compromise

Internal plugins: goroutines with zero-copy structured event delivery (DirectBridge)
External plugins: connect over TLS

Four invocation modes

Mode	How it works
In-process goroutine	Zero-copy, DirectBridge hot path
Forked subprocess	TLS connect-back, per-plugin token
Direct call	Sync in-process
Remote	External binary over TLS

Plugin SDKs for Go and Python. Any language via JSON/text over stdin/stdout.

Plugin Filters and Developer Tools

Route filters

External route filters on import/export via redistribution { import [...] export [...] }
Filters chain as piped transforms with delta-only attribute changes

Three filter categories (always applied in order)

Category	Who controls it	Example
Mandatory	RFC compliance, always runs	RFC 9234 OTC
Default	Engine, can be overridden	Loop prevention
User	Operator chooses	`rpki:validate`

Developer tools

ze bgp plugin cli: plugin debug shell over SSH against a live daemon

Not everything is AI-centric. You too can be a human-powered BGP route filter.

Protocol Coverage

21 Address Families

AFI	Families
IPv4 / IPv6	unicast, multicast, VPN, FlowSpec, MPLS, MUP, MVPN
IPv4 only	RTC
L2VPN	EVPN, VPLS
BGP-LS	BGP-LS, BGP-LS-VPN (40 TLVs)

13 Capabilities

Category	Capabilities
Core	4-byte ASN, Extended Messages, Route Refresh, Enhanced Route Refresh
Routing	Add-Path, Extended Next-Hop, GR, Long-Lived GR
Operations	BGP Roles, RPKI, Software Version, Hostname, Link-Local NH

Full RFC 4271 best-path selection. All families registered by plugins at startup: adding a new family means writing a plugin, no engine changes needed.

RPKI Integration

Protocol

RTR protocol client (RFC 6810/8210)
Origin validation integrated into best-path selection
Valid / Invalid / NotFound status on routes

Design choice

Consumers can subscribe to RPKI events separately, or get merged update-rpki events
Each UPDATE arrives pre-correlated with its validation status

Testing

ze-test rpki: deterministic mock RPKI server (validation result derived from IP)
ze-test rtr-mock: mock RTR cache server with explicit VRPs (prefix/ASN/max-length entries)
Full lab testing: Ze connects to mock RTR, receives VRPs, validates live routes

Looking Glass

Public view

Built-in public looking glass (separate HTTP server, no auth, read-only)
Peer dashboard with live SSE updates
Route lookup, AS path search, community search

Visualization

AS path topology graph: server-side SVG, Sugiyama layout, pure Go (no GraphViz, no JS)
Birdwatcher-compatible REST API: plugs directly into Alice-LG

Looking Glass

Looking Glass - Route Search

Operational Intelligence

Team Cymru DNS

AS numbers annotated with organization names throughout the system: web UI, looking glass, AS path graphs. Live DNS lookups with caching.

PeeringDB

ze update bgp peer * prefix queries PeeringDB and auto-sets prefix maximums. Configurable margin and staleness warnings.

The Decorator Framework

Add ze:decorate to any YANG leaf and the system automatically enriches it using a decoration function. Team Cymru and PeeringDB are just two decorators that ship by default. Easy to add your own.

Prometheus metrics, structured JSON logging, streaming route events

Fleet Management

(highly experimental and untested)

Distribution

Centralized config distribution over TLS
Hub/client model with per-client secrets
Version-hashed config fetch (only download on change)

Resilience

Two-phase config change: hub notifies, client fetches when ready
Partition resilient: clients cache config locally, start from cache when hub is unreachable

What Ships

Binaries

Tool	Purpose
`ze`	Daemon, CLI, config editor, SSH server, web UI, looking glass
`ze-test`	Test runner and mock servers (13 subcommands)
`ze-perf`	Cross-implementation propagation latency benchmark
`ze-chaos`	Chaos testing orchestrator with web dashboard
`ze-analyse`	MRT dump analysis: attribute stats, community density, route counts

What Ships: ze-test

ze-test subcommands

Subcommand	Purpose
`bgp`	Functional tests: encode, plugin, decode, parse, reload, chaos-web
`editor`	Editor TUI tests (.et files)
`ui`	UI functional tests (completion, CLI)
`mcp`	MCP client (send commands to daemon via MCP)
`managed`	Managed config tests (hub, auth, fleet)
`web`	Web browser functional tests (.wb files)
`peer`	BGP test peer (sink/echo/check modes)
`rtr-mock`	Mock RTR cache server (explicit VRPs)
`rpki`	Deterministic RPKI mock server (IP modulo)
`peeringdb`	Deterministic PeeringDB mock server (ASN-derived)
`syslog`	Syslog server for testing
`text-plugin`	Minimal text-mode plugin (for .ci tests)

What Ships: ze-analyse

Learn from real Internet BGP data (RIPE RIS, RouteViews)

Command	What we learned
`density`	72% of UPDATE messages carry a single prefix Measures burst rates to size per-peer buffers
`attributes`	55M routes: 789 unique NEXT_HOP 344K unique COMMUNITY 7M unique AS_PATH Bundle dedup without AS_PATH: 97% hit rate
`communities`	Finds communities attached to 95%+ of an ASN's routes Calculates per-ASN wire byte savings
`count-attrs`	90% of routes carry 3 to 5 attributes No route in the full table has more than 10
`download`	Fetches RIPE RIS and RouteViews data RIB snapshots and live update streams

AS_PATH	Unique bundles	Dedup rate
With	9M / 55M	84%
Without	1.7M / 55M	97%

Still need to investigate how AS_PATH is handled

Performance

Two use cases pull in opposite directions

Route announcement (ExaBGP use case): optimise for zero-copy generation and sending
Router (route server, RIB, filtering): need to parse for filtering, need backpressure

Zero-copy architecture

Lazy-parsed WireUpdate
ContextID forwarding (same encoding context = forward raw bytes)
Automatic update groups
Per-attribute-type pools with dedup
Buffer-first encoding into pooled bounded buffers

Everything Is Registered, Everything Is Discoverable

Registration

Every plugin registers: name, families, capabilities, YANG schema, dependencies, event types
Every env var registered via env.MustRegister(): unregistered access aborts
Every RPC defined in YANG: no command exists without a schema

Auto-generated

Every CLI command auto-generated from registrations: no hand-wired dispatch
The binary tells you what it can do. If a plugin adds a new env var or RPC, it appears in CLI introspection automatically.

Verification tools

Command	What it checks
`make ze-spec-status`	Spec inventory with progress tracking
`make ze-inventory`	Plugins, families, RPCs, YANG modules, tests, packages
`make ze-validate-commands`	Every CLI command matches its YANG schema
`make ze-doc-drift`	Documentation matches reality

Testing Infrastructure

Coverage

490 functional tests (.ci): real config, real daemon, real wire output
32 interop scenarios against FRR, BIRD, GoBGP in Docker (not mock peers)
Fuzz testing on all wire parsers

Specialized testing

Chaos testing framework with web dashboard (convergence tracking, property verification)
Editor tests (.et) for headless TUI simulation
ExaBGP compatibility test suite

Gate

make ze-verify: 26 linters + unit + functional + ExaBGP before any commit

Ze Chaos

What's Next

Status

Pre-release: looking for early adopters and feedback

Roadmap

IP filter auto-update
BMP (RFC 7854)
BFD integration
Docker images
Community and documentation

Until release, development is at:

codeberg.org/thomas-mangin/ze

Questions?

Only 40MB of code ATM

Only 20MB of vendoring code

(and no RSI! 😊)

Release will be at: github.com/ze-software/ze

Thank you