A context window is not infinitely large. Every token you spend on formatting overhead is a token you cannot spend on actual information. Every format you choose trades something: structure for compactness, readability for density, semantic signals for raw throughput.
CAL's FORMAT clause lets you make that trade explicitly. Six formats, same data, same budget — different trade-offs. This post puts them side by side.
One Scenario, Four Formats
A customer success agent is helping Elena with her account renewal. The memory store has three beliefs, one goal, and one tool result. Here is the same data rendered four ways.
SML — Semantic Tags for LLM Reasoning
<context intent="helping elena with account renewal negotiation">
<belief subject="elena" confidence="0.93">prefers multi-year terms when pricing is fixed</belief>
<belief subject="elena" confidence="0.88">has budget authority up to $500k annually</belief>
<belief subject="elena" confidence="0.81">makes renewal decisions in Q1</belief>
<goal subject="elena" state="active" deadline="2026-03-31">negotiate renewal before fiscal year close</goal>
<action tool="get_renewal_quote" phase="completed">450 seats at $290/seat = $1.44M/yr; 3-year at $265/seat; 18% discount for 3-year commit</action>
</context>Markdown — Human-Readable Structure
## Context: helping elena with account renewal negotiation
### Beliefs about elena
- **prefers multi-year terms when pricing is fixed** (confidence: 0.93)
- **has budget authority up to $500k annually** (confidence: 0.88)
- **makes renewal decisions in Q1** (confidence: 0.81)
### Active Goals
- Negotiate renewal before fiscal year close — deadline: 2026-03-31
### Recent Actions
- `get_renewal_quote` (completed): 450 seats at $290/seat = $1.44M/yr; 3-year at $265/seat; 18% discount for 3-year commitTOON — Tabular, Token-Efficient
beliefs[3]{subject,confidence,content}:
elena,0.93,prefers multi-year terms when pricing is fixed
elena,0.88,has budget authority up to $500k annually
elena,0.81,makes renewal decisions in Q1
goals[1]{subject,state,deadline,content}:
elena,active,2026-03-31,negotiate renewal before fiscal year close
actions[1]{tool,phase,content}:
get_renewal_quote,completed,450 seats at $290/seat = $1.44M/yr; 3-year at $265/seat; 18% discount for 3-year commit
JSON — Structured Machine-Readable
{
"assembly": {
"intent": "helping elena with account renewal negotiation",
"grain_count": 5,
"budget_used": 380
},
"grains": [
{"type": "belief", "subject": "elena", "confidence": 0.93,
"content": "prefers multi-year terms when pricing is fixed"},
{"type": "belief", "subject": "elena", "confidence": 0.88,
"content": "has budget authority up to $500k annually"},
{"type": "belief", "subject": "elena", "confidence": 0.81,
"content": "makes renewal decisions in Q1"},
{"type": "goal", "subject": "elena", "state": "active",
"deadline": "2026-03-31",
"content": "negotiate renewal before fiscal year close"},
{"type": "action", "tool": "get_renewal_quote",
"phase": "completed",
"content": "450 seats at $290/seat..."}
]
}The Token Cost
Same five grains. Radically different token budgets:
| Format | Approx. Tokens | vs JSON | Best Use Case |
|---|---|---|---|
| TOON | ~95 | -65% | Large result sets, tight budgets |
| text | ~110 | -60% | Ultra-compact summaries |
| SML | ~155 | -44% | LLM system prompts, epistemic reasoning |
| markdown | ~190 | -31% | Human-facing dashboards, audit logs |
| YAML | ~230 | -17% | Config-style consumption |
| JSON | ~275 | baseline | Downstream pipelines, APIs |
At 5 grains the difference is ~120 tokens. At 50 grains it is ~1,200 tokens. At scale, format choice is a budget decision.
TOON achieves its savings through tabular layout: the grain type and column names appear once per section, not once per grain. For uniform arrays (50 beliefs about the same user), the savings compound. For mixed-type assemblies, TOON groups by type and each group gets its own header — still compact, but the per-section overhead adds up.
When to Use Each
SML — Default for LLM System Prompts
Use SML when the model needs to distinguish between types of information. The tag name is the signal: <belief confidence="0.65"> tells the model to hedge. <consent action="denied"> tells it what it cannot do. <reasoning type="abductive"> tells it an inference has already been drawn.
ASSEMBLE user_context
FOR "resolving elena's renewal"
FROM beliefs: (RECALL beliefs ABOUT "elena" LIMIT 10),
goals: (RECALL goals ABOUT "elena" WHERE goal_state = "active")
BUDGET 3000 tokens
FORMAT smlSML is the right default when your agent reasons about user preferences, active objectives, prior inferences, or permission boundaries. See the SML deep dive for all 10 grain types with examples.
Use when: LLM system prompts, epistemic-sensitive applications (support, legal, medical, financial).
TOON — Tight Budgets, Large Result Sets
Use TOON when you have 20+ grains of the same type and every token matters. TOON's CSV-tabular layout writes column headers once, then emits rows:
RECALL beliefs ABOUT "elena"
WHERE relation IS PREFERENCE
| ORDER BY confidence DESC | LIMIT 50
AS toon50 beliefs in TOON uses roughly the same tokens as 30 beliefs in JSON. If you are building a knowledge retrieval step that feeds a second LLM call, TOON preserves more information within the same budget.
TOON follows the TOON specification v3.0. It combines YAML-like indentation for nested structures with CSV-style rows for uniform arrays — the common case for CAL output.
Use when: Large RECALL results (20+ grains), tight ASSEMBLE budgets, homogeneous sources.
Markdown — Human-Readable Displays
Use Markdown when the output will be read by a human — support dashboards, escalation reviews, operator audit trails, debugging interfaces.
ASSEMBLE escalation_summary
FOR "escalation review for elena's account"
FROM history: (RECALL events ABOUT "elena" RECENT 20),
actions: (RECALL actions ABOUT "elena" RECENT 10)
BUDGET 4000 tokens
FORMAT markdownMarkdown also works well when the model is summarising or transforming the context rather than acting on it directly.
Use when: Human-facing UIs, audit logs, escalation summaries, summarisation tasks.
JSON — Downstream Code Processing
Use JSON when the output goes to code, not an LLM — analytics pipelines, reporting dashboards, rules engines, integration APIs.
ASSEMBLE account_snapshot
FOR "account health report"
FROM beliefs: (RECALL beliefs ABOUT "elena"),
consent: (RECALL consents WHERE subject = "elena")
BUDGET 200 grains
FORMAT jsonJSON includes the full machine envelope — assembly metadata, grain counts, budget utilisation, source labels — that other formats omit.
Use when: APIs, databases, analytics, any downstream code that needs structured data.
Text — Absolute Minimum
Use text when you need the smallest prose footprint and the LLM does not need to distinguish grain types:
RECALL beliefs ABOUT "elena"
WHERE relation IS PREFERENCE | LIMIT 5 AS textOutput:
elena prefers multi-year contract terms when pricing is fixed.
elena has budget authority up to $500k annually without board approval.
elena makes renewal decisions in Q1 to align with fiscal year.
Use when: Ultra-compact contexts, sub-summaries within larger prompts, no structure needed.
Mixing Formats in One Assembly
CAL's AS clause controls format per-source. The FORMAT clause controls the outer envelope:
CAL/1 ASSEMBLE mixed_context
FOR "support review for elena"
FROM
profile: (RECALL beliefs ABOUT "elena"
WHERE relation IS PREFERENCE | LIMIT 10 AS toon),
history: (RECALL events ABOUT "elena" RECENT 10 AS sml),
actions: (RECALL actions ABOUT "elena" RECENT 5 AS markdown)
BUDGET 4000 tokens
PRIORITY history > profile > actions
FORMAT smlProfile data comes as compact TOON — saves tokens on a uniform array of beliefs. Conversation history comes as SML — preserves epistemic signals for the model. Actions come as Markdown — readable in an audit trail. Each source is optimised independently.
Quick Reference: Which Format?
| Scenario | Format | Why |
|---|---|---|
| LLM needs to reason about trust, confidence, or permissions | sml | Tag names carry epistemic signals the model can act on |
| Tight budget, 20+ grains of the same type | toon | ~40% fewer tokens than JSON; tabular layout eliminates per-grain overhead |
| LLM is summarising or transforming (not acting) | markdown | Readable structure without per-element semantic weight |
| Minimal context injection, no structure needed | text | Absolute smallest token footprint |
| Output goes to code, APIs, or databases | json | Structured data with full machine envelope |
| Human-facing dashboard or audit log | markdown | Renders natively in any UI that supports Markdown |
The right format is the one that gets the most information into the context window at the lowest token cost while preserving the signals the consumer actually needs. Everything else is overhead.
The FORMAT system is defined in the CAL specification. For how CAL queries work end-to-end, see CAL: The Query Language Your Agent Orchestrator Has Been Missing. For a deep dive into SML's 10 grain types, see SML: The Context Format That Tells LLMs What to Trust. TOON is defined in the TOON specification v3.0.