Ze: Redoing and improving on ExaBGP

Zeledon

An extensible, plugin-driven, network software

Thomas Mangin · Chief Madness Officer, Exa Networks

(with HTML skillz from Claude)

Network engineers who know ExaBGP: thanks! welcome! you will feel at home

Network engineers who don't: this was the "first" popular programmable BGP toolkit

Happy to answer quick questions as we go, keep big ones for the end

What is Ze?

The project

Design

License

The Story: VyOS (early 2020)

The Story: Experimentation

Personal prototyping

Not only a hobby

AI collaboration

The Story ... so far (100+ days)

  • Liane and Lou 👋 took over managing R&D and infrastructure, freeing up my time
  • Like during COVID, it gave me time to invest in R&D
  • A good occasion to explore how AI would change our business
  • My biggest convert is my co-director who now uses AI daily for SalesForce reporting
WhatCount
Commits1,891
Go source files3,090 (948k lines)
Go test files611 (236k lines)
Functional tests (.ci)492
Editor tests (.et)144
YANG modules64
Codeberg activity graph showing Dec 2025 - Mar 2026 burst

AI Is Not Magic

The good

The bad

I went too quick, too fast. Learned to tame the beast as I went

So much for Vibe coding, read the code!

Always ask is there is remaining or defered work

AI Won't Always Do What You Ask

Agrees, then silently substitutes

The spec says:
"Each 64K block can serve one Extended Message peer's
 overflow item, OR be subdivided into 16 x 4K slices
 for standard peers.This avoids maintaining two separate pools."

What Claude built: two separate pools with a shared counter

You're right. I apologize. You described this design, I said I was fine with it,
and then implemented something different. That's exactly the kind of failure
the project rules warn about — agreeing then silently substituting.

Developing with AI

What Worked

  • Test Driven Development
  • 3,470 co-authored commits
  • ~100 days from zero to a "basic" NOS
  • RFC (very well defined) implementation
  • Test generation, refactoring across files

What Doesn't

  • Trusting the generated code
  • Without tests, many non-obvious bugs
  • /deep-review always finds issues, every single time
  • Hoping an AI can design innovative software
  • You can outsource development, not design

How to work with it

The .claude System

Those problems don't go away. You build systems to catch them.

Rules with reasons

"Too simple to need a test" → Test it

"Pre-existing issue" → Always report. Investigate. Ask the user.

"Should work" → Run it, paste output

It still does what it was trained to do (once the context exceeds 250k tokens)

Enforcement

The .claude System

The process

The system is as much a deliverable as the code itself

"simple" review

Critical Review Summary showing 7 issues found and fixed

ExaBGP Compatibility: Your Scripts Still Work

Migration tools

The promise

The code should not even be called alpha: very untested, run at your own risk.
But: all ExaBGP compatibility tests pass and you can play with it.

AI-First Design: Ze Goes Further

One interface

Machine integration

Human interface

The Plugin Architecture

Minimal engine

Self-contained plugins

Ze Engine Core (event bus for components and plugins) |-- BGP Component (FSM, wire, reactor) |-- <future components> |-- Plugin Infrastructure (registry, process manager, hub) |-- bgp-rib (route storage + best-path) |-- bgp-rs (route server, RFC 7947) |-- bgp-gr (graceful restart, RFC 4724/9494) |-- bgp-rpki (origin validation, RFC 6811) |-- bgp-nlri-evpn (L2VPN EVPN, RFC 7432) |-- bgp-nlri-flowspec (RFC 8955/8956) |-- filter-community (tag/strip communities) |-- ... 18 more

YANG-Modeled Configuration

Schema-driven

Runtime

One YANG schema drives

CLI, SSH, and ZeFS

ZeFS: config as a blob store

Network OS workflow (via built-in SSH server)

ze config editOpens interactive editor session
set bgp peer 10.0.0.1 as 65001Modify config in draft
diffReview pending changes
commit / commit confirmed 5Apply (with optional auto-revert in N minutes)
rollback 1Restore previous revision

Extras

CLI

Ze CLI with tab completion and config editing

Web Interface

YANG-driven UI

Collaboration

Extras

Web Interface

Ze web UI

Plugin Modes and Performance

No performance compromise

Four invocation modes

ModeHow it works
In-process goroutineZero-copy, DirectBridge hot path
Forked subprocessTLS connect-back, per-plugin token
Direct callSync in-process
RemoteExternal binary over TLS

Plugin SDKs for Go and Python. Any language via JSON/text over stdin/stdout.

Plugin Filters and Developer Tools

Route filters

Three filter categories (always applied in order)

CategoryWho controls itExample
MandatoryRFC compliance, always runsRFC 9234 OTC
DefaultEngine, can be overriddenLoop prevention
UserOperator choosesrpki:validate

Developer tools

Not everything is AI-centric. You too can be a human-powered BGP route filter.

Protocol Coverage

21 Address Families

AFIFamilies
IPv4 / IPv6unicast, multicast, VPN, FlowSpec, MPLS, MUP, MVPN
IPv4 onlyRTC
L2VPNEVPN, VPLS
BGP-LSBGP-LS, BGP-LS-VPN (40 TLVs)

13 Capabilities

CategoryCapabilities
Core4-byte ASN, Extended Messages, Route Refresh, Enhanced Route Refresh
RoutingAdd-Path, Extended Next-Hop, GR, Long-Lived GR
OperationsBGP Roles, RPKI, Software Version, Hostname, Link-Local NH

Full RFC 4271 best-path selection. All families registered by plugins at startup: adding a new family means writing a plugin, no engine changes needed.

RPKI Integration

Protocol

Design choice

Testing

Looking Glass

Public view

Visualization

Looking Glass

Looking Glass - Route Search

Operational Intelligence

Team Cymru DNS

AS numbers annotated with organization names throughout the system: web UI, looking glass, AS path graphs. Live DNS lookups with caching.

PeeringDB

ze update bgp peer * prefix queries PeeringDB and auto-sets prefix maximums. Configurable margin and staleness warnings.

The Decorator Framework

Add ze:decorate to any YANG leaf and the system automatically enriches it using a decoration function. Team Cymru and PeeringDB are just two decorators that ship by default. Easy to add your own.

Prometheus metrics, structured JSON logging, streaming route events

Fleet Management

(highly experimental and untested)

Distribution

Resilience

What Ships

Binaries

ToolPurpose
zeDaemon, CLI, config editor, SSH server, web UI, looking glass
ze-testTest runner and mock servers (13 subcommands)
ze-perfCross-implementation propagation latency benchmark
ze-chaosChaos testing orchestrator with web dashboard
ze-analyseMRT dump analysis: attribute stats, community density, route counts

What Ships: ze-test

ze-test subcommands

SubcommandPurpose
bgpFunctional tests: encode, plugin, decode, parse, reload, chaos-web
editorEditor TUI tests (.et files)
uiUI functional tests (completion, CLI)
mcpMCP client (send commands to daemon via MCP)
managedManaged config tests (hub, auth, fleet)
webWeb browser functional tests (.wb files)
peerBGP test peer (sink/echo/check modes)
rtr-mockMock RTR cache server (explicit VRPs)
rpkiDeterministic RPKI mock server (IP modulo)
peeringdbDeterministic PeeringDB mock server (ASN-derived)
syslogSyslog server for testing
text-pluginMinimal text-mode plugin (for .ci tests)

What Ships: ze-analyse

Learn from real Internet BGP data (RIPE RIS, RouteViews)

CommandWhat we learned
density72% of UPDATE messages carry a single prefix
Measures burst rates to size per-peer buffers
attributes55M routes: 789 unique NEXT_HOP
344K unique COMMUNITY
7M unique AS_PATH
Bundle dedup without AS_PATH: 97% hit rate
communitiesFinds communities attached to 95%+ of an ASN's routes
Calculates per-ASN wire byte savings
count-attrs90% of routes carry 3 to 5 attributes
No route in the full table has more than 10
downloadFetches RIPE RIS and RouteViews data
RIB snapshots and live update streams
AS_PATHUnique bundlesDedup rate
With9M / 55M84%
Without1.7M / 55M97%

Still need to investigate how AS_PATH is handled

Performance

Two use cases pull in opposite directions

Zero-copy architecture

Everything Is Registered, Everything Is Discoverable

Registration

Auto-generated

Verification tools

CommandWhat it checks
make ze-spec-statusSpec inventory with progress tracking
make ze-inventoryPlugins, families, RPCs, YANG modules, tests, packages
make ze-validate-commandsEvery CLI command matches its YANG schema
make ze-doc-driftDocumentation matches reality

Testing Infrastructure

Coverage

Specialized testing

Gate

Ze Chaos

Ze Chaos dashboard - Families view Ze Chaos dashboard - Convergence histogram

What's Next

Status

Roadmap

Until release, development is at:

codeberg.org/thomas-mangin/ze

Questions?

Codeberg repo

Only 40MB of code ATM

Only 20MB of vendoring code

(and no RSI! 😊)

Release will be at: github.com/ze-software/ze

Thank you