Architecture

sajou is a visual choreographer for AI agents. It translates agent events -- signals -- into animated visual scenes via declarative choreographies. Think of it as a stage director that watches a stream of machine events and orchestrates a live visual performance in response.

sajou is not a dashboard. It is not a monitoring tool. It is a scene engine where signals are the music, choreographies are the score, and the stage is where the performance happens.

The 3-layer architecture

The entire system is built on three layers. This separation is sacred -- never shortcut from signal directly to render. The choreography layer is the product.

Signals (data)  -->  Choreographer (sequences)  -->  Stage (render)

Signals

Typed JSON events emitted by AI agent backends. Task dispatches, tool calls, state changes, streaming text, token usage, errors, user interactions.

The signal protocol is open: any string is a valid signal type. There are 14 well-known types with typed payloads (task_dispatch, tool_call, tool_result, token_usage, agent_state_change, error, completion, text_delta, thinking, and 5 user.* interaction types), but the bus accepts arbitrary types with Record<string, unknown> payloads. This means sajou can consume events from any backend without protocol changes.

Signals arrive over WebSocket, SSE, OpenAI-compatible streaming, Anthropic API, or the OpenClaw protocol. The transport is pluggable.

Choreographer

The choreographer receives signals and triggers performances -- declarative step sequences described in JSON, not imperative code. This is what makes sajou composable by both humans and AIs.

A choreography looks like this:

json

{
  "on": "task_dispatch",
  "steps": [
    { "action": "move", "entity": "agent", "to": "forge", "duration": 800 },
    { "action": "spawn", "entity": "pigeon", "at": "barracks" },
    { "action": "fly", "entity": "pigeon", "to": "oracle", "duration": 1200, "easing": "arc" },
    {
      "action": "onArrive",
      "steps": [
        { "action": "destroy", "entity": "pigeon" },
        { "action": "flash", "target": "oracle", "color": "gold" }
      ]
    }
  ]
}

The runtime supports concurrent performances with tween-based timing, step chaining (onArrive), interruption handling (onInterrupt), parallel execution, and a typed CommandSink interface that decouples choreography logic from any renderer.

The choreographer lives in @sajou/core. It has zero external dependencies and is framework-agnostic -- it runs in the browser and in Node.js. All choreographer logic is unit-testable without any rendering.

Stage

The stage is the renderer. It takes commands from the choreographer and draws them on screen using Three.js.

Entities live on a 2D top-down board (orthographic camera, scene(x, y) mapped to world(x, 0, z)). The stage handles spritesheet animation (UV frame cycling), lights (ambient, directional, point lights with flicker), CPU-simulated particles, and shader effects.

The renderer library lives in @sajou/stage. The visual editor is tools/scene-builder.

Package map

Package	Role	Dependencies
`@sajou/schema`	Signal protocol types, scene format JSON Schema, TypeScript types generated from schemas	None
`@sajou/core`	Choreographer runtime -- zero deps, framework-agnostic, browser + Node.js	`@sajou/schema`
`@sajou/stage`	Three.js renderer library (EntityManager, LightManager, TextureLoader, cameras, CommandSink)	Three.js
`@sajou/emitter`	Test signal emitter with predefined scenarios (WebSocket, speed control, replay loop)	`@sajou/schema`

Schema is the shared contract

@sajou/schema is the single source of truth for all declarative formats. Any change to schemas must be discussed and validated before implementation. If you need a schema change, propose it as a separate commit with justification.

Tools

Tool	Description
scene-builder	Main authoring tool. Visual scene editor with entity placement, wiring (patch bay), node canvas, step chain editing, shader editor, run mode with live preview, and ZIP export/import. Built with Vite + Three.js + Canvas2D overlay.
player	Plays exported scene files produced by the scene-builder.

Key concepts

Entities

Visual actors on the stage. Sprites, animated spritesheets, or GIF sequences placed at (x, y) positions on the board. Each entity has properties like position, scale, rotation, opacity, animation state, and billboard mode. Entities are defined in the scene and referenced by choreographies.

Choreographies

Declarative step sequences triggered by signal types. The available step actions form a finite vocabulary:

Action	Purpose
`move`	Move an entity to a position over a duration with easing
`spawn`	Create a new entity at a location
`destroy`	Remove an entity from the stage
`fly`	Move with a trajectory (arc, line, bezier)
`flash`	One-shot visual effect on a target
`pulse`	Repeating visual effect
`drawBeam`	Draw a visible connection between two points
`typeText`	Progressively reveal text at a location
`playSound`	Trigger an audio sample
`wait`	Pause in the sequence
`parallel`	Run multiple steps concurrently
`onArrive`	Chain steps after an animation completes
`onInterrupt`	Handle cancellation or error mid-flight

Wiring (Patch Bay)

A TouchDesigner-style connection graph that links signals to choreographies to renderers. The wiring system has three layers:

Signal to signal-type -- routes incoming signals to typed channels
Signal-type to choreographer -- connects signal types to choreography triggers
Choreographer to theme/shader -- connects choreography outputs to visual effects

Positions and routes

Named waypoints and navigable paths on the stage. Entities move along routes between positions. Positions are semantic labels (e.g., "forge", "barracks", "oracle") that map to (x, y) coordinates.

Bindings

Direct property assignments triggered by signals -- a peer of the choreographer in the dispatch path. While choreographies describe multi-step sequences, bindings handle immediate property changes: set opacity to 0.5, change animation state to "idle", rotate by 45 degrees. Bindings are simpler and faster than choreographies for single-property reactions.

Signal filters

Wire-level processing pipelines that transform signals before they reach the choreographer. Filters are chained on individual wires:

throttle -- limit signal rate
sample -- take the latest value at intervals
delta -- only pass when value changes
when -- conditional gate (pass only if predicate is true)
map -- transform payload fields

Run mode

Live execution of the scene. When entering run mode, the scene-builder snapshots the current entity state, instantiates the choreographer, subscribes to the active signal sources, and dispatches incoming signals in real-time through the wiring graph. The choreographer triggers performances, the command sink forwards commands to the Three.js renderer, and the stage animates.

Data flow

Signal Source (WebSocket / SSE / OpenClaw / Simulator)
    |
    v
Signal Bus (onSignal listeners)
    |
    |---> Signal Log (raw display, 10k retention)
    |
    +---> Run Mode Controller
              |
              |---> Wire Filter Chains (throttle, sample, delta, when, map)
              |
              |---> Choreographer (handleSignal -> trigger performances)
              |         |
              |         +---> CommandSink -> RenderAdapter -> Three.js
              |
              +---> Binding Executor (direct property assignments)

Signals enter through one of the supported transports, hit the signal bus, and fan out. The signal log captures everything for inspection (10,000 entries in memory, 500 rendered in the DOM with virtual scrolling). The run mode controller routes signals through the wiring graph: filter chains process them first, then the choreographer matches signal types to choreographies and triggers performances. The command sink is the bridge between the framework-agnostic choreographer and the Three.js renderer. In parallel, the binding executor applies direct property assignments that bypass the choreography system.

The choreography layer is not optional

Even for simple "set property X when signal Y arrives" cases, the data flows through the dispatch system (bindings), not directly from signal to renderer. The 3-layer separation is an invariant, not a guideline.

Architecture Decision Records

The foundational design decisions are documented in ADRs:

ADR	Topic
001 -- Signal Protocol	Envelope + typed payload, open protocol, `correlationId` grouping
002 -- Choreographer Runtime	Concurrent performances, tween-based timing, typed CommandSink, TestClock
002 -- Entity Format	Extensible entity format (sprites to spritesheets to 3D models)
003 -- Renderer Stack	Three.js as the rendering foundation

Architecture ​

The 3-layer architecture ​

Signals ​

Choreographer ​

Stage ​

Package map ​

Tools ​

Key concepts ​

Entities ​

Choreographies ​

Wiring (Patch Bay) ​

Positions and routes ​

Bindings ​

Signal filters ​

Run mode ​

Data flow ​

Architecture Decision Records ​