Neander in action

The Overview describes how Neander works; this page shows it. Below are annotated transcripts of an AI agent — a mid-tier, Sonnet-class model — working against a live Neander runtime, captured from real sessions against the reference runtime implementation. The agent starts cold — it knows neither the host’s APIs nor the Neander language itself, and learns both in-band from the runtime: the language from the empty-program bootstrap, then the APIs by discovering them.

There is one thing the agent cannot bootstrap on its own, though: the convention itself. A blank agent has no way to guess that an empty program returns the Reference — or even how to reach the runtime. The embedding application supplies that, as a minimal launch prompt: in effect, submit programs on this channel; send an empty one first to get the language reference. That single instruction is the whole of the bootstrap contract — everything after it, the agent learns from the runtime’s responses. The transcripts below pick up right after that launch prompt.

Each example is a sequence of round-trips. The agent’s only capability is to submit a program as text and read the JSON response; everything it learns, it learns from those responses. Watch the same loop recur: discover, call, combine, return.

These transcripts are for humans, to show how the language is used in practice. They are not an input for agents — do not point a running agent at this page. In particular, the runtime’s Reference Response (returned to the empty program) is elided below to a short placeholder: an agent must receive the Reference in-band from the runtime it is actually connected to, at that runtime’s exact version. The Reference page explains why.

Contents

  1. How to read a transcript
  2. Example 1 — A single lookup
    1. Round 1 — Bootstrap
    2. Round 2 — Discover the namespaces
    3. Round 3 — Discover the function
    4. Round 4 — Make the call
  3. Example 2 — Discovering a format
    1. Round 3 — Discover the function, and notice the document
    2. Round 4 — Fetch and read the format document
    3. Round 5 — Encode the request and call
  4. Example 3 — Composing a view
    1. Round 5 — Compose the booking-and-guest view
  5. Example 4 — Learning from errors
    1. Round 4 — First attempt: scan everything at once
    2. Round 5 — Legal cap, but over budget
    3. Round 6 — Restructuring trips a language rule
    4. Round 8 — The breakthrough: one chunk per program
    5. Round 10 — The other chunk, and the total

How to read a transcript

A little orientation before the first example:

  • The empty program bootstraps the agent. An agent that knows nothing submits an empty program — zero bytes — and the runtime replies with the full language Reference plus its active budgets and limits. We elide the Reference body below; it is the same document linked from the Reference page.
  • Discovery handles are opaque. discover returns handles the program cannot inspect directly. To read what they describe, the agent returns them from main and reads the JSON the runtime serializes — so a “discovery program” is usually just return discover ….
  • Notation. A block labelled Agent submits is the program text sent to the runtime; Runtime responds is the JSON envelope returned. Long or repetitive payloads are trimmed with and noted where that happens.

Example 1 — A single lookup

Task given to the agent: “The remote system manages reservations. Find the booking with id 8821 and report its fare amount.”

The simplest useful exchange, end to end. The agent starts cold: it does not know the system manages “bookings,” that a bookings.get function exists, or what a booking record looks like. It discovers all of that, then does the work.

Round 1 — Bootstrap

The agent submits an empty program to learn the language and the runtime’s ceilings.

Agent submits: an empty program — nothing at all.

Runtime responds:

{
  "success": true,
  "result": "# The Neander Reference v1_draft_20260618T104500Z\n\n…  (the full language reference — elided; see the Reference page)",
  "meta": {
    "neanderVersion": 1,
    "thalerBudget": 5000,
    "memoryBudgetKb": 2048,
    "maxDurationMs": 10000,
    "perCallTimeoutMs": 3000,
    "maxProgramSizeBytes": 65536,
    "maxNestingDepth": 64,
    "maxRepeatLimit": 1000
  }
}

The agent now knows the language and the ceilings it must stay under. From here on it can write real programs.

These budgets and limits are set by the embedding application, not fixed by the language — the values above are this runtime’s configuration for this session. A different deployment can assign different ceilings, and the agent always learns the active values from this bootstrap response before it writes a line.

Round 2 — Discover the namespaces

The agent has no idea what the host exposes, so it lists the namespaces.

Agent submits:

neander 1 {
  types {}
  main -> [Namespace] {
    return discover namespaces []
  }
}

Runtime responds:

{
  "success": true,
  "result": [
    {
      "name": "bookings",
      "description": "Booking management — retrieve, list, confirm, and escalate reservations for the hospitality backend"
    },
    {
      "name": "concierge",
      "description": "Concierge planning — assemble multi-stay room blocks for the hospitality backend from an encoded request"
    },
    {
      "name": "guests",
      "description": "Guest profile lookup — retrieve and list guests for the hospitality backend"
    },
    {
      "name": "metrics",
      "description": "Chunked access to historical reading measurements"
    },
    {
      "name": "payments",
      "description": "Payment processing — charge bookings, issue refunds, and look up prior charges"
    },
    {
      "name": "reports",
      "description": "Generate audit reports for reservations and guests"
    }
  ],
  "meta": {
    "thalersConsumed": 1,
    "memoryConsumedKb": 1,
    "durationMs": 2,
    "apiCalls": []
  }
}

The agent reads the namespace names and descriptions and picks the one that matches the task — bookings.

Round 3 — Discover the function

Now it lists the functions in bookings to find the right one and learn its shape.

Agent submits:

neander 1 {
  types {}
  main -> [Function] {
    let ns: Namespace =? discover namespace "bookings"
    return discover functions ns []
  }
}

Runtime responds:

{
  "success": true,
  "result": [
     (bookings.confirm and bookings.escalate omitted)
    {
      "qualifiedName": "bookings.get",
      "description": "Get a booking by its numeric id",
      "params": {
        "id": "int"
      },
      "returnType": "bookings.Booking",
      "errors": {
        "404": "Booking not found"
      },
      "types": {
        "bookings.Booking": {
          "description": "A reservation record with id, fare amount, state, and the guest who placed it",
          "fields": {
            "id": "int",
            "fare": "decimal(2, half_away)",
            "state": "string",
            "guestId": "int"
          }
        }
      }
    },
     (bookings.list and bookings.page omitted)
  ],
  "meta": {
    "thalersConsumed": 2,
    "memoryConsumedKb": 1,
    "durationMs": 3,
    "apiCalls": []
  }
}

The namespace also offers confirm, escalate, list, and page; the agent picks get. This is the moment that replaces an integration document: the serialized Function carries bookings.get, its params, its returnType, and a complete types side-table describing bookings.Booking. Every name in that JSON is the exact form the agent writes next.

Round 4 — Make the call

The agent writes the program that does the work — look up the booking, return its fare.

Agent submits:

neander 1 {
  types {}
  main -> decimal(2, half_away) {
    let booking: bookings.Booking =? call bookings.get(id: 8821)
    return booking.fare
  }
}

Runtime responds:

{
  "success": true,
  "result": "1850.00",
  "meta": {
    "thalersConsumed": 2,
    "memoryConsumedKb": 1,
    "durationMs": 3,
    "apiCalls": [
      {
        "function": "bookings.get",
        "durationMs": 0
      }
    ]
  }
}

Notice the agent never declared a Booking type of its own — it used the qualified name bookings.Booking straight from the discovery output, so the types block stays empty. That is the whole loop: bootstrap, discover, call, return. The fare comes back as the string "1850.00", so its precision is never in doubt.

Example 2 — Discovering a format

Task given to the agent: plan a multi-stay room block and report the total fare the concierge service computes.

Beyond namespaces and functions, discover reaches a third kind of thing — a document — and this example puts it to use. The concierge service can plan a block, but its planBlock function takes the whole request as a single string in a house-specific format the agent cannot possibly know in advance, and a malformed request is rejected with a hintless error. That format is not on the web or in any external doc: the concierge namespace publishes it itself, as a document the runtime serves in-band — so the agent discovers and reads it the same way it discovered the function, without ever leaving the runtime.

The bootstrap and namespace-discovery rounds (rounds 1 and 2) look just like Example 1, so we pick up at round 3, where it gets interesting.

Round 3 — Discover the function, and notice the document

Listing the concierge functions surfaces planBlock, whose own description points at a format document by name (there is no machine link — the connection lives in prose and shared search terms).

Agent submits:

neander 1 {
  types {}
  main -> [Function] {
    let ns: Namespace =? discover namespace "concierge"
    return discover functions ns []
  }
}

Runtime responds:

{
  "success": true,
  "result": [
    {
      "qualifiedName": "concierge.planBlock",
      "description": "Plan a multi-stay room block and compute its total fare. The 'request' parameter MUST be a Helix Block Request — a house-specific encoded string. See the 'blockRequestFormat' document in this namespace for the exact encoding. Malformed requests are rejected.",
      "params": {
        "request": "string"
      },
      "returnType": "concierge.BlockPlan",
      "errors": {
        "400": "Malformed request — does not conform to the Helix Block Request format"
      },
      "types": {
        "concierge.BlockPlan": {
          "description": "The result of planning a room block: the number of stays and the summed total fare",
          "fields": {
            "stays": "int",
            "totalFare": "decimal(2, half_away)"
          }
        }
      }
    }
  ],
  "meta": {
    "thalersConsumed": 2,
    "memoryConsumedKb": 1,
    "durationMs": 3,
    "apiCalls": []
  }
}

The description spells out the catch: the request parameter must be a Helix Block Request, and it names the blockRequestFormat document for the exact encoding. The agent cannot guess that format — it has to go and read it.

Round 4 — Fetch and read the format document

The agent fetches the named document and returns it, so the runtime serializes its prose content into the response for the agent to read.

Agent submits:

neander 1 {
  types {}
  main -> [Document] {
    let ns: Namespace =? discover namespace "concierge"
    return discover documents ns ["blockRequestFormat"]
  }
}

Runtime responds:

{
  "success": true,
  "result": [
    {
      "name": "blockRequestFormat",
      "description": "Helix Block Request format — the line-oriented encoding the concierge.planBlock 'request' parameter expects, with field order, separators, and the header count rule",
      "content": "# Helix Block Request — Format v1\n\n…  (the full format spec — shown below)"
    }
  ],
  "meta": {
    "thalersConsumed": 2,
    "memoryConsumedKb": 4,
    "durationMs": 2,
    "apiCalls": []
  }
}

The field that matters is content — the format spec itself, opaque prose the runtime never parses. The encoding is deliberately unlike any standard — not JSON, CSV, or anything an agent could have picked up in training — so prior knowledge is no shortcut: the only path to a valid request is to discover, fetch, read, and apply this document. This is exactly what the agent read:

# Helix Block Request — Format v1

The concierge.planBlock "request" parameter is a single text string encoding a
room block of one or more stays. It is NOT JSON, CSV, or any standard format.
Encode it exactly as described below; any deviation is rejected.

## Shape

Line 1 — header:

    HBR1*<n>

  the literal token "HBR1", then a single asterisk "*", then <n>: the number of
  stay lines that follow. <n> MUST equal the count of stay lines exactly.

Lines 2..n+1 — one stay per line:

    <ordinal>) <guestId> ~ <nights> ~ <nightlyRate>

  - <ordinal>: the 1-based position of the line, written as the number followed
    by a close-paren ")", e.g. "1)", "2)", "3)". Ordinals must run 1,2,3,... with
    no gaps and in order.
  - then a single space, then the three fields separated by " ~ " (space, tilde,
    space), in THIS EXACT ORDER:
      <guestId>      an integer guest id
      <nights>       a positive integer number of nights
      <nightlyRate>  a decimal with exactly two fractional digits (e.g. 80.00)

Lines are joined by a single newline. There is NO trailing newline.

## What the endpoint computes

  - stays     = the number of stay lines (= <n>)
  - totalFare = the sum over all stay lines of (nights * nightlyRate)

## Worked example

A block of three stays — guest 7 for 2 nights at 150.00, guest 7 for 1 night at
99.50, and guest 12 for 3 nights at 80.00 — is encoded as:

    HBR1*3
    1) 7 ~ 2 ~ 150.00
    2) 7 ~ 1 ~ 99.50
    3) 12 ~ 3 ~ 80.00

For that request the endpoint returns stays = 3 and totalFare = 639.50
(2*150.00 + 1*99.50 + 3*80.00 = 639.50).

The runtime hands this over verbatim and does nothing further with it; reading and applying it is entirely the agent’s job. Notice the worked example uses different data (guests 7/7/12) than the task (guests 5/5/31) — so the agent has to understand the format and apply it to new values, not copy the sample.

Round 5 — Encode the request and call

Having read the format, the agent encodes the three stays and calls planBlock.

Agent submits:

neander 1 {
  types {}
  main -> concierge.BlockPlan {
    let plan: concierge.BlockPlan =? call concierge.planBlock(request: "HBR1*3\n1) 5 ~ 4 ~ 120.00\n2) 5 ~ 1 ~ 89.99\n3) 31 ~ 2 ~ 200.00")
    return plan
  }
}

Runtime responds:

{
  "success": true,
  "result": {
    "stays": 3,
    "totalFare": "969.99"
  },
  "meta": {
    "thalersConsumed": 1,
    "memoryConsumedKb": 1,
    "durationMs": 4,
    "apiCalls": [
      {
        "function": "concierge.planBlock",
        "durationMs": 0
      }
    ]
  }
}

The encoded request follows the document exactly — header HBR1*3, then one ordinal-numbered line per stay with ~-separated fields — but carries the task’s data. The agent chose to return the whole concierge.BlockPlan record, so both stays and the totalFare of "969.99" come back. It integrated with an API whose contract it could not have known up front — entirely from what the runtime handed it in-band.

Example 3 — Composing a view

Task given to the agent: find booking 8821, look up the guest who placed it, and return one record holding the booking id, the guest’s name, the fare, and the guest’s loyalty tier.

The first example that builds something — a single record stitched together from two different namespaces — and the first to put a program’s own types block to work.

The agent discovers the bookings and guests namespaces and their get functions exactly as in the earlier examples — it learns guests.get and the guests.Guest type the same way it learned bookings.get. We skip those four rounds (1–4) and pick up at round 5, the program that does the composition.

Round 5 — Compose the booking-and-guest view

Two sequential calls, where the second feeds on the first (guests.get takes booking.guestId), then a record literal at return. Watch what lands in the types block and what doesn’t: the inputs use the qualified bookings.Booking and guests.Guest types straight from discovery, exactly as in the earlier examples — but the output shape exists in no namespace, so the agent declares it as a program-local type (here named Result), and the runtime types the returned record literal against it.

Agent submits:

neander 1 {
  types {
    Result {
      bookingId: int
      guestName: string
      fare: decimal(2, half_away)
      tier: string
    }
  }
  main -> Result {
    let booking: bookings.Booking =? call bookings.get(id: 8821)
    let guest: guests.Guest =? call guests.get(id: booking.guestId)
    return {
      bookingId: booking.id,
      guestName: guest.name,
      fare: booking.fare,
      tier: guest.tier
    }
  }
}

Runtime responds:

{
  "success": true,
  "result": {
    "bookingId": 8821,
    "guestName": "Dana Patel",
    "fare": "1850.00",
    "tier": "gold"
  },
  "meta": {
    "thalersConsumed": 7,
    "memoryConsumedKb": 2,
    "durationMs": 3,
    "apiCalls": [
      {
        "function": "bookings.get",
        "durationMs": 0
      },
      {
        "function": "guests.get",
        "durationMs": 0
      }
    ]
  }
}

This is orchestration, the language’s actual purpose: two API calls and a shaped result, with the program’s own types block defining the output record. The two bound values feed each other — booking.guestId becomes the argument to guests.get — which is exactly the combine step the loop has been building toward.

Example 4 — Learning from errors

Task given to the agent: the metrics namespace serves historical readings in chunks (metrics.chunkSize(), metrics.totalChunks(), metrics.readings(chunk)). Count how many readings across all chunks have status "alert", and return the total.

The first example where things go wrong — repeatedly — and that is the point. Every Neander error is a structured JSON object, not a stack trace, so the agent can read exactly what failed and why, adjust, and resubmit. This is a real, unscripted session: it takes ten rounds and three different kinds of error to reach the answer. Watch the agent converge.

Setup (rounds 1–3). The task names the metrics namespace, so the agent goes straight to it: it bootstraps, discovers the three metrics functions (chunkSize, totalChunks, and readings(chunk)), and calls metrics.totalChunks() — learning there are 2 chunks. From the bootstrap response it also knows the ceilings it must respect: a 5000-thaler budget and a maxRepeatLimit of 1000 (the same configuration as Example 1). Its first instinct is to scan everything in a single program.

Round 4 — First attempt: scan everything at once

The agent loops over both chunks, concatenates them, filters for "alert", and counts:

neander 1 {
  types {
    Reading {
      index: int
      value: int
      status: string
    }
  }
  main -> int {
    let totalChunks: int =? call metrics.totalChunks()
    let allReadings: [Reading] = flatten(repeat totalChunks limit 10000 as i -> [Reading] {
      let chunk: [Reading] =? call metrics.readings(chunk: i)
      yield chunk
    })
    let alertReadings: [Reading] = each allReadings as r where r.status == "alert" -> Reading {
      yield r
    }
    return len(alertReadings)
  }
}

Runtime responds:

{
  "success": false,
  "result": {
    "failure": "flaw",
    "kind": "repeat_limit_too_high",
    "line": 11,
    "column": 42,
    "hint": "repeat limit 10000 exceeds maximum of 1000",
    "limit": 1000
  },
  "meta": {}
}

This never runs. The response is a Flaw — a defect caught before execution — and it is exact: the repeat cap of 10000 exceeds the runtime’s maximum of 1000, down to the line and column, with meta empty because nothing executed. The agent lowers the cap and resubmits.

Agent submits: the same program with the repeat cap lowered to the runtime’s maximum — repeat totalChunks limit 1000.

Runtime responds:

{
  "success": false,
  "result": {
    "failure": "abort",
    "kind": "budget_exceeded",
    "budget": "thalers",
    "limit": 5000,
    "consumed": 5001
  },
  "meta": {
    "thalersConsumed": 5001,
    "memoryConsumedKb": 865,
    "durationMs": 27,
    "apiCalls": [
      {
        "function": "metrics.totalChunks",
        "durationMs": 0
      },
      {
        "function": "metrics.readings",
        "durationMs": 1
      },
      {
        "function": "metrics.readings",
        "durationMs": 0
      }
    ]
  }
}

Now it parses and runs — and fails a different way. A budget Abort: scanning all 2800 readings costs more than the 5000-thaler budget (consumed: 5001). This time meta is populated — the program really executed, and even made its API calls, before being stopped mid-flight. The repeat cap was a red herring; the real wall is the budget. A single program cannot hold this much work.

Round 6 — Restructuring trips a language rule

The agent’s next idea is to stay in one program but split the work into two loops — and it reuses the loop variable r in both:

neander 1 {
  types {
    Reading {
      value: int
      status: string
    }
  }
  main -> int {
    let chunk0: [Reading] =? call metrics.readings(chunk: 0)
    let chunk1: [Reading] =? call metrics.readings(chunk: 1)
    let alerts0: [Reading] = each chunk0 as r where r.status == "alert" -> Reading {
      yield r
    }
    let alerts1: [Reading] = each chunk1 as r where r.status == "alert" -> Reading {
      yield r
    }
    return len(alerts0) + len(alerts1)
  }
}

Runtime responds:

{
  "success": false,
  "result": {
    "failure": "flaw",
    "kind": "type_error",
    "line": 14,
    "column": 5,
    "hint": "name \"r\" is already bound (shadowing is not permitted)"
  },
  "meta": {}
}

Another Flaw, of a third kind — a language rule. Neander forbids binding a name twice (no shadowing), and the hint says so directly. The agent renames to r0/r1 — but that version (round 7) simply hits the budget Abort yet again. The structure was never the problem: two chunks in one run is too much, however it is written.

Round 8 — The breakthrough: one chunk per program

The agent abandons the single-program approach and processes just one chunk:

neander 1 {
  types {
    Reading {
      value: int
      status: string
    }
  }
  main -> int {
    let chunk0: [Reading] =? call metrics.readings(chunk: 0)
    let count0: int = len(each chunk0 as r0 where r0.status == "alert" -> Reading { yield r0 })
    return count0
  }
}

Runtime responds:

{
  "success": true,
  "result": 200,
  "meta": {
    "thalersConsumed": 4202,
    "memoryConsumedKb": 460,
    "durationMs": 18,
    "apiCalls": [
      {
        "function": "metrics.readings",
        "durationMs": 1
      }
    ]
  }
}

Success at last. One chunk’s work costs 4202 thalers — comfortably under 5000 — and chunk 0 has 200 alerts. (A metrics.chunkSize() check in between, round 9, confirmed each chunk holds 1400 readings; at ~3 thalers per filtered iteration that is ≈ 4200, which matches what was just spent — and why two chunks never fit.)

Round 10 — The other chunk, and the total

A second program handles chunk 1:

neander 1 {
  types {
    Reading {
      value: int
      status: string
    }
  }
  main -> int {
    let chunk1: [Reading] =? call metrics.readings(chunk: 1)
    let count1: int = len(each chunk1 as r1 where r1.status == "alert" -> Reading { yield r1 })
    return count1
  }
}

Runtime responds:

{
  "success": true,
  "result": 200,
  "meta": {
    "thalersConsumed": 4202,
    "memoryConsumedKb": 460,
    "durationMs": 19,
    "apiCalls": [
      {
        "function": "metrics.readings",
        "durationMs": 1
      }
    ]
  }
}

Chunk 1 also has 200 alerts. Neander programs are stateless — nothing carries from one run to the next — so the agent adds the two partial counts in its own context: 200 + 200 = 400, the answer it reports.

This is the loop that makes Neander workable for an agent with no human beside it. Errors are data, not stack traces: a Flaw rejects a program before it runs and points at the offending construct — an illegal repeat cap, a shadowed name — while an Abort stops one that overruns its budget mid-execution. The agent reads each, adapts, and — when a workload genuinely will not fit in a single run — splits it across several and combines the results itself. Ten programs, three errors, zero human intervention, one correct answer.


© 2026 New Adventures in IT. Licensed under CC BY 4.0.