Open Memory Specification (OMS)
Memory Grain (.mg) Container Definition
Version: 1.3 Status: Standards Track Category: Data Formats Date: February 2026 Copyright: Public Domain (CC0 1.0 Universal) License: This specification is offered under the Open Web Foundation Final Specification Agreement (OWFa 1.0)
Table of Contents
- Introduction
- Conventions and Terminology
- Blob Layout and Structure
- Canonical Serialization
- Content Addressing
- Field Compaction
- Multi-Modal Content References
- Grain Types
- Cryptographic Signing
- Selective Disclosure
- File Format (.mg files)
- Identity and Authorization
- Sensitivity Classification
- Cross-Links and Provenance
- Temporal Modeling
- Encoding Options
- Conformance Levels
- Device Profiles
- Error Handling
- Security Considerations
- Test Vectors
- Implementation Notes
- Grain Protection and Invalidation Policy
- Observer Type Registry
- Observation Mode Registry
- Observation Scope Registry
- Grain Type Field Specifications
- Query Conventions
- Appendix A: Domain Profile Registry
- Appendix B: ABNF Grammar
- Appendix C: Field Mapping Table
- Appendix D: Compliance Mapping
- Appendix E: Version History
- Appendix F: Glossary
- Appendix G: Complete Example Grain
Abstract
The Open Memory Specification (OMS) is an open standard for portable, auditable, and interoperable agent memory across autonomous systems, AI agents, and distributed knowledge networks. OMS defines the Memory Grain (.mg) container — a standard binary representation for immutable, content-addressed knowledge units (grains). This document specifies the wire format, serialization rules, cryptographic integrity mechanisms, and compliance features necessary for secure and portable interchange of agent memory across platforms, languages, and deployment models. A memory grain is the atomic unit of agent knowledge—a single immutable fact, episode, observation, or decision record—identified by the SHA-256 hash of its canonical binary representation. The .mg container provides:
- Deterministic serialization ensuring identical content always produces identical bytes
- Content addressing via SHA-256 for integrity, deduplication, and identity
- Compact binary encoding using MessagePack (default) or CBOR (optional)
- Cryptographic verification via COSE Sign1 envelopes (optional)
- Field-level privacy through selective disclosure
- Compliance primitives for GDPR, CCPA, HIPAA, and other regulations
- Multi-modal references to external content (images, video, embeddings)
- Decentralized identity via W3C DIDs
- Grain protection via invalidation policies that restrict who may supersede or contradict a grain
The .mg container format is to autonomous systems what JSON is to APIs and .git objects are to version control: a universal, language-agnostic, self-describing interchange format. It is the foundational wire format of OMS.
CAL (Context Assembly Language) (CONTEXT-ASSEMBLY-LANGUAGE-CAL-SPECIFICATION.md) and SML (Semantic Markup Language) (SEMANTIC-MARKUP-LANGUAGE-SML-SPECIFICATION.md) are part of OMS v1.3. CAL defines the query and context-assembly layer that operates on OMS stores; SML is CAL's default output format for LLM context consumption. See §1.5 for details.
1. Introduction
1.1 Purpose
Autonomous systems and AI agents require persistent memory to function effectively over time. Unlike transient conversation context (which lives in an LLM's context window), persistent memory must be:
- Portable – transferable between agents, systems, and organizations
- Verifiable – integrity can be cryptographically proven
- Immutable – once created, never modified (supersession creates new records)
- Auditable – full provenance chain recorded
- Compliant – designed for regulatory requirements (GDPR, HIPAA, etc.)
- Interoperable – works across programming languages and platforms
- Efficient – minimal storage with content deduplication
- Secure – encryption, signing, and selective disclosure support
OMS addresses this gap by defining a universal standard for knowledge interchange, with the .mg container as the foundational wire format.
1.2 Design Principles
- References, not blobs — Multi-modal content (images, audio, video, embeddings) is referenced by URI, never embedded in grains
- Additive evolution — New fields never break old implementations; parsers ignore unknowns
- Minimal required fields — Each memory type defines only essential fields
- Semantic triples — Subject-relation-object model for natural knowledge graph mapping
- Compliance by design — Provenance, timestamps, user identity, and namespace baked into every grain
- No AI in the format — Deterministic serialization; LLMs belong in the engine layer, not the wire protocol
- Index without deserialize — Fixed headers enable O(1) field extraction for efficient scanning
- Sign without PKI — Decentralized identity (DIDs) enable verification without certificate authorities
- Share without exposure — Selective disclosure reveals some fields while hiding others
- One file, full memory — A .mg container file is the portable unit for full knowledge export
1.3 Terminology
| Term | Definition |
|---|---|
| Memory grain | Atomic, indivisible unit of knowledge — one .mg blob (fact, episode, observation, etc.) |
| Blob | Complete .mg binary — version byte + optional header + canonical payload |
| Content address | Lowercase hex SHA-256 hash of complete blob bytes — the grain's unique identifier |
| Canonical serialization | MessagePack or CBOR encoding with deterministic key ordering, string normalization, null omission |
| Field compaction | Mapping human-readable field names to short keys for storage efficiency |
| Grain container | .mg file — portable unit containing indexed set of grains with checksum |
| Modality | Type of content: text, image, audio, video, point cloud, 3D mesh, embedding, binary |
| DID | Decentralized identifier — W3C standard for cryptographic identity without central registry |
| COSE | CBOR Object Signing and Encryption — RFC 9052 standard for signing binary payloads |
1.4 Scope and Limitations
In scope:
- Binary serialization format for individual grains
- .mg file container format for grain collections
- Deterministic encoding and hashing
- Cryptographic signing and selective disclosure
- Content reference and embedding reference schemas
- Identity and authorization models
- Sensitivity classification
- Cross-link and provenance tracking
Out of scope:
- Storage layer implementation (filesystem, S3, database, IPFS)
- Index layer queries and optimization — see CAL (§1.5)
- Policy engines and compliance rule evaluation
- Transport protocols (HTTP, MQTT, Kafka)
- Encryption at rest (applications of per-grain encryption are external to this spec)
- Agent-to-agent communication protocol (which uses .mg format)
1.5 Companion Specifications
OMS defines the wire format and grain semantics. Two companion specifications are part of the OMS v1.3 release and are included in this repository:
CAL — Context Assembly Language (CONTEXT-ASSEMBLY-LANGUAGE-CAL-SPECIFICATION.md)
CAL is a non-destructive, deterministic, LLM-native language for assembling agent context from OMS memory stores. It answers the question: "what should be in the agent's context window right now?" Key properties:
- Operates on all 10 OMS grain types (Belief, Event, State, Workflow, Action, Observation, Goal, Reasoning, Consensus, Consent)
- Extends the OMS Store Protocol (§28.4) with a formal, structured query syntax
ASSEMBLEstatements compose context from multiple grain sources within a token budget- Append-only: CAL writes create new grains via
put; the language cannot delete or modify existing grains — this is enforced at the grammar level - Dual wire format: human-readable
text/caland machine-readableapplication/json+calare bijectively equivalent
SML — Semantic Markup Language (SEMANTIC-MARKUP-LANGUAGE-SML-SPECIFICATION.md)
SML is a flat, tag-based markup format optimized for LLM context consumption. It is not XML. Tag names are OMS grain types (<belief>, <goal>, <event>, …); attributes carry lightweight decision metadata; text content is natural language. SML is the default output format for CAL ASSEMBLE statements and is designed to be consumed directly by an LLM without an XML processor.
2. Conventions and Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 and RFC 8174.
Hexadecimal values are lowercase. Byte sequences are represented in hex with spaces between bytes for clarity (e.g., 01 89 a2).
3. Blob Layout and Structure
3.1 Blob Format (byte 0x01)
0 1 2 3 4 5 6 7 8 9 10 ...
+-------+-------+-------+---+---+-------+-------+-------+-------+-------+---
| Ver | Flags | Type | NS hash | created_at (u32) | MsgPack
| 0x01 | uint8 | uint8 | uint16 | (epoch seconds) | payload
+-------+-------+-------+---+---+-------+-------+-------+-------+-------+---
Fixed header (9 bytes) Variable
3.1.1 Header Bytes
Byte 0 — Version: 0x01 — any other value is rejected with ERR_VERSION
Byte 1 — Flags (bit field):
| Bit | Flag | Meaning |
|---|---|---|
| 0 | signed |
COSE Sign1 envelope wraps this grain |
| 1 | encrypted |
Payload is encrypted (AES-256-GCM) |
| 2 | compressed |
Payload is zstd-compressed before encryption |
| 3 | has_content_refs |
Grain references external multi-modal content |
| 4 | has_embedding_refs |
Grain references external vector embeddings |
| 5 | cbor_encoding |
Payload is CBOR instead of MessagePack |
| 6-7 | sensitivity |
Classification: 00=public, 01=internal, 10=pii, 11=phi |
Byte 2 — Type (cognitive grain type):
| Value | Type | Description |
|---|---|---|
| 0x01 | Belief | Structured belief — (subject, relation, object) triple with confidence and source |
| 0x02 | Event | Timestamped occurrence — message, interaction, or behavioral event |
| 0x03 | State | Agent state snapshot — portable save point |
| 0x04 | Workflow | Learned action sequence — procedural memory |
| 0x05 | Action | Tool invocation or code execution |
| 0x06 | Observation | Raw sensory or cognitive input |
| 0x07 | Goal | Objective with lifecycle semantics |
| 0x08 | Reasoning | Inference chain and thought audit trail |
| 0x09 | Consensus | Multi-agent agreement record |
| 0x0A | Consent | Permission grant or withdrawal — DID-scoped, purpose-bounded |
| 0x0B–0xEF | Reserved | Future standard types |
| 0xF0–0xFF | Domain profile types | Application-defined per Appendix A domain profiles |
Bytes 3-4 — Namespace Hash: First two bytes of SHA-256(namespace), encoded as uint16 big-endian. Provides 65,536 routing buckets without deserialization. Full namespace string remains authoritative in payload. This field is a routing hint only and MUST NOT be used for security decisions (see §13.3, §20).
Bytes 5-8 — Created-at: uint32 epoch seconds (1970-01-01 onwards). Range: 1970 to 2106. The created_at header field is a coarse routing hint only — for TTL and time-range indexing. It MUST NOT be used as the authoritative event timestamp. Authoritative timestamps belong in the payload (timestamp_ms field). Full millisecond precision available via timestamp_ms (§6.1).
3.2 Byte Order
All multi-byte values follow big-endian (network) byte order. MessagePack and CBOR specifications handle encoding details.
3.3 Minimum and Maximum Sizes
- Minimum blob: 10 bytes (9-byte header + 1-byte empty MessagePack map
0x80) - Maximum blob: 4 GB (uint32 in standard MessagePack, larger via extension)
- Recommended maximum: 1 MB for extended profile, 32 KB for standard profile, 512 bytes for lightweight profile
4. Canonical Serialization
To ensure deterministic hashing and cross-implementation compatibility, all serialization MUST follow these canonical rules:
4.1 Key Ordering
Map keys MUST be sorted lexicographically by their UTF-8 byte representation. This applies recursively to all nested maps. Ordering is case-sensitive and treats bytes as unsigned integers.
CORRECT ordering: {"adid": ..., "c": ..., "ca": ..., "ns": ..., "o": ..., "r": ..., "s": ..., "st": ..., "t": ...}
WRONG ordering: {"s": ..., "c": ..., "ca": ..., "adid": ..., ...}
Lexicographic comparison: byte 0 vs byte 0, if equal advance to byte 1, etc.
Map keys MUST be unique within a map. Duplicate keys MUST be rejected with ERR_CORRUPT.
4.2 Integer Encoding
Integers MUST use the smallest MessagePack/CBOR representation:
| Range | MessagePack Encoding |
|---|---|
| 0 to 127 | positive fixint (1 byte) |
| -32 to -1 | negative fixint (1 byte) |
| 128 to 255 | uint8 (2 bytes) |
| 256 to 65,535 | uint16 (3 bytes) |
| -128 to -33 | int8 (2 bytes) |
| -32,768 to -129 | int16 (3 bytes) |
For CBOR, follow RFC 8949 Section 4.2.1 (Preferred Encoding).
4.3 Float Encoding
Floating-point numbers MUST be encoded as IEEE 754 double precision (float64, 8 bytes) in MessagePack format. Single-precision (float32) MUST NOT be used. In CBOR, use major type 7 with 27 (64-bit IEEE 754).
Float64 values MUST NOT be NaN or Infinity. Serializers MUST reject non-finite values with ERR_FLOAT_INVALID. IEEE 754 permits multiple NaN bit patterns (varying sign, exponent, and mantissa bits), which produce different byte sequences and therefore different content addresses across runtimes. Rejecting all non-finite values eliminates this ambiguity and ensures cross-implementation hash stability.
4.4 String Encoding
All strings (keys and values) MUST be UTF-8 encoded and MUST be NFC-normalized (Unicode Normalization Form Canonical Composition per UAX #15) before encoding. Strings MUST NOT contain a byte-order mark (BOM, bytes EF BB BF). Parsers MUST reject strings beginning with a BOM with ERR_CORRUPT.
Example: Combining character e + \u0301 (combining acute) → precomposed character \u00e9 (é)
4.5 Null Omission
Map entries with null/None/nil values MUST be omitted entirely from the serialized form. Absent fields default to:
- Strings: None or empty
- Numbers: 0 or 0.0
- Booleans: false
- Arrays: empty list
- Maps: None
Semantic distinction: Absent fields are semantically distinct from fields explicitly set to a default value. Consumers MUST NOT treat an absent field as equivalent to a field present with its default value. Serializers MUST NOT auto-insert default values during round-trip serialization; doing so changes the blob bytes and produces a different content address.
Rationale: Forward compatibility (new optional fields don't change existing hashes), determinism (no ambiguity between absent and null), compactness.
4.6 Array Ordering
Array elements MUST preserve insertion order. Arrays are NOT sorted.
4.7 Nested Compaction
Three fields use nested field compaction:
content_refs— use CONTENT_REF_FIELD_MAP (Section 7.1)embedding_refs— use EMBEDDING_REF_FIELD_MAP (Section 7.2)related_to— use RELATED_TO_FIELD_MAP (Section 14.2)
Other array-of-maps fields (provenance_chain, context, history) are NOT compacted recursively.
4.8 Datetime Conversion
All datetime fields (valid_from, valid_to, created_at, system_valid_from, system_valid_to) are converted to Unix epoch milliseconds (int64) before serialization:
epoch_ms = floor(datetime.timestamp() * 1000)
Example: 2026-01-15T10:00:00.000Z → 1768471200000
4.9 Serialization Algorithm
- Validate required fields per memory type schema. Reject if missing.
- Compact field names via FIELD_MAP (Section 5).
- Compact nested maps in
content_refsandembedding_refsonly. - Convert datetimes to epoch milliseconds.
- NFC-normalize all strings (recursive).
- Omit null/None values (recursive).
- Sort map keys lexicographically (recursive).
- Encode as MessagePack/CBOR using rules above.
- Prepend version byte and header — build the 9-byte header:
[0x01, flags, type, ns_hash_hi, ns_hash_lo, created_at_sec_b3, created_at_sec_b2, created_at_sec_b1, created_at_sec_b0]wherens_hash_hi:ns_hash_lo = SHA-256(namespace)[0:2]as uint16 big-endian, and prepend to payload. - Compute SHA-256 over complete blob bytes.
4.10 Nesting Depth Limit
Implementations SHOULD enforce a maximum nesting depth to prevent stack overflow vulnerabilities from adversarially or accidentally deeply nested payloads. Recommended limits by profile:
| Profile | Maximum Nesting Depth |
|---|---|
| Extended | 32 levels |
| Standard | 16 levels |
| Lightweight | 8 levels |
Parsers MAY reject payloads exceeding their profile limit with ERR_CORRUPT.
5. Content Addressing
The content address of a .mg blob is computed as:
content_address = lowercase_hex(SHA-256(complete_blob_bytes))
Where complete_blob_bytes is the complete 9-byte fixed header followed by the canonical MessagePack/CBOR payload:
- Bytes 0–8: Fixed header (version, flags, type, ns_hash[2], created_at_sec[4])
- Bytes 9+: Canonical MessagePack/CBOR payload
The hash MUST be represented as a 64-character lowercase hexadecimal string. Uppercase hexadecimal MUST be rejected.
5.1 Content Address Format (ABNF)
content-address = 64 HEXDIG
HEXDIG = DIGIT / "a" / "b" / "c" / "d" / "e" / "f"
DIGIT = %x30-395.2 Hash Function
SHA-256 is defined in FIPS 180-4. No alternative hash functions are permitted in v1.0.
5.3 Collision Resistance
SHA-256 provides 128-bit collision resistance (in practical terms). At 2^128 hashes, collision probability becomes significant. Current estimates suggest SHA-256 remains secure for the foreseeable future.
5.4 Content Address as Identity
The content address serves as:
- Unique identifier — filename in content-addressed stores
- Integrity check — any byte change produces different hash
- Deduplication key — byte-identical content maps to same address
- Provenance link — derived grains reference source hashes
- Access key — retrieve grain from store by address
5.5 Temporal Uniqueness of Content Addresses
The content address includes created_at_sec from the fixed header (bytes 5–8), which is part of the hashed bytes. Two grains with identical semantic payload but different creation timestamps produce different content addresses — creation time is part of grain identity.
Rationale: Binding the content address to the creation time ensures each write event is a unique, non-replayable grain. An adversary cannot substitute a grain with an older timestamp without producing a different hash, preserving audit chain integrity.
Implication for deduplication: Content-address deduplication applies only to byte-identical blobs (same payload encoded at the same creation second). For semantic deduplication — the same fact written at different times — use superseded_by to mark the older grain as replaced, or derived_from to express provenance. The phrase "identical content maps to same address" (§5.4) means byte-identical, including the creation timestamp.
5.6 Immutability Boundary
A grain has two distinct layers with different mutability guarantees:
| Layer | Contents | Mutability | Covered by content address | Covered by COSE signature |
|---|---|---|---|---|
| Blob | 9-byte fixed header + MessagePack/CBOR payload | Immutable — once written, never modified | Yes | Yes |
| Index | Status and access-tracking fields (§28.3) | Mutable — updated by the store/index layer | No | No |
A grain's content is the immutable blob identified by its content address. A grain's status is maintained in the index layer. Index-layer fields — superseded_by, system_valid_to, verification_status, access_count, last_accessed_at — are NOT part of the hashed blob bytes and are NOT covered by COSE signatures. They are managed exclusively by the store after initial write (see §28.3 for update rules).
This separation is fundamental to the OMS architecture:
- Integrity — the content address guarantees the blob is unchanged. Index-layer mutations cannot alter a grain's identity or tamper with signed content.
- Lifecycle — grains can be superseded, retracted, or verified without rewriting the original blob or invalidating its signature.
- Access tracking — read counters and timestamps can be updated without breaking content addressing.
Implementations MUST store index-layer fields outside the .mg blob — in a database index, sidecar metadata, or equivalent external structure. Writers MUST NOT embed index-layer fields in the blob payload; stores MUST NOT recompute content addresses when index-layer fields change.
Portability: When grains are exported as .mg files, index-layer state is carried in the optional index manifest (§11.7). This preserves the "one file, full memory" principle — a .mg file contains both the immutable grain blobs and their current lifecycle state.
6. Field Compaction
To minimize blob size, human-readable field names are mapped to short keys before serialization. The mapping is bijective (one-to-one).
6.1 Core Fields
| Full Name | Short Key | Type | Description |
|---|---|---|---|
type |
t |
string | Memory type: "fact", "episode", etc. |
subject |
s |
string | Entity being described (RDF subject) |
relation |
r |
string | Semantic relationship (RDF predicate) |
object |
o |
string | Value or target (RDF object) |
confidence |
c |
float64 | Credibility score [0.0, 1.0] |
source_type |
st |
string | Provenance origin (open enum). Common values: "user_explicit", "consolidated", "llm_generated", "sensor", "imported", "agent_inferred", "system". See note below. |
created_at |
ca |
int64 | Creation timestamp (epoch ms) |
temporal_type |
tt |
string | "state" or "observation" |
valid_from |
vf |
int64 | Temporal validity start (epoch ms) |
valid_to |
vt |
int64 | Temporal validity end (epoch ms) |
system_valid_from |
svf |
int64 | When grain became active in system |
system_valid_to |
svt |
int64 | When grain was superseded in system |
context |
ctx |
map | Contextual metadata (string→string) |
superseded_by |
sb |
string | Content address of superseding grain |
contradicted |
ct |
bool | Whether this grain is contradicted |
importance |
im |
float64 | Importance weighting [0.0, 1.0] |
author_did |
adid |
string | DID of creating agent |
namespace |
ns |
string | Memory partition/category |
user_id |
user |
string | Associated data subject (GDPR) |
structural_tags |
tags |
array[string] | Classification tags |
derived_from |
df |
array[string] | Parent content addresses |
consolidation_level |
cl |
int | 0=raw, 1=frequency, 2=pattern, 3=sequence |
success_count |
sc |
int | Feedback: successful uses |
failure_count |
fc |
int | Feedback: failed uses |
provenance_chain |
pc |
array[map] | Full derivation trail |
origin_did |
odid |
string | Original source agent DID |
origin_namespace |
ons |
string | Original source namespace |
content_refs |
cr |
array[map] | References to external content |
embedding_refs |
er |
array[map] | References to vector embeddings |
related_to |
rt |
array[map] | Cross-links to related grains |
_elided |
_e |
map | Selective disclosure — elided field hashes |
_disclosure_of |
_do |
string | Content address of original grain (if disclosed) |
invalidation_policy |
ip |
map | Protection policy governing supersession and contradiction (see §23) |
supersession_justification |
sj |
string | Required on superseding grain when original has mode: "soft_locked" |
supersession_auth |
sa |
array | COSE signatures authorizing supersession for mode: "quorum" |
owner |
own |
map | LegalEntity map (§12.5.1) — legal entity with rights and liabilities over the agent |
category |
cat |
uint8 | Routing category within the grain type — see §27 Grain Type Field Specifications |
run_id |
rid |
string | Session or run identifier — scopes grain to a specific agent execution. Distinct from user_id (data subject) and namespace (logical partition). |
role |
role |
string | Message role for Event grains — open enum, standard values: "user", "assistant", "system", "tool" |
access_count |
ac |
int | Number of times this grain has been retrieved — updated by the store on reads, not by the writer. Enables recency/frequency scoring. |
last_accessed_at |
laa |
int64 | Epoch ms of most recent retrieval — updated by the store on reads. Pair with access_count for importance decay models. |
timestamp_ms |
tms |
int64 | High-precision payload timestamp (epoch ms). The authoritative event timestamp. The header's created_at_sec is a coarse routing hint only. |
observer_did |
obsdid |
string | DID of the entity that observed or measured — distinct from author_did (who wrote the grain into the store). |
subject_did |
sdid |
string | DID of the entity this grain is about — distinct from user_id (GDPR data subject) and author_did (writer). |
session_id |
sid2 |
string | Session scope — distinct from run_id (execution scope) and user_id (data subject). |
entity_id |
eid |
string | External entity reference — product ID, patient MRN, vehicle chassis ID, instrument serial. Not a DID; opaque to the spec. |
epistemic_status |
epstat |
string | Categorical certainty: "certain", "probable", "uncertain", "estimated", "derived". Complements the continuous confidence float. Open enum. |
verification_status |
vstatus |
string | Values: "unverified" (default), "verified", "contested", "retracted". |
requires_human_review |
rhr |
bool | If true, this grain's content MUST NOT drive automated decisions until a human has reviewed and cleared it. Binding for Reasoning grains; advisory for others. |
processing_basis |
pbasis |
string | Content address of the Consent grain that authorized this grain's creation. Used to compute erasure scope on consent revocation. |
identity_state |
idst |
string | Identity resolution state: "anonymous", "pseudonymous", "authenticated". Affects personalization logic and compliance scope. |
license |
lic |
string | SPDX license identifier for the grain's content. Example: "CC-BY-4.0", "CC0-1.0", "proprietary". |
trusted_timestamp |
tts |
map | RFC 3161 timestamp token: {tsp_response: bytes, tsa_uri: string}. Legally defensible creation time from an accredited TSA, independent of self-reported created_at. |
invalidation_type |
itype |
string | Semantic reason for supersession: "superseded", "retraction", "erratum", "corrigendum", "retraction_with_replacement", "expression_of_concern". Set by actor creating the superseding grain. |
invalidation_reason |
ireason |
string | Human-readable rationale for invalidation_type. |
invalidation_initiator |
iinit |
string | DID of the party initiating the invalidation. |
retention_policy |
rpol |
map | Minimum retention requirements: {minimum_retention_years: int, regulation: string, deletion_requires: string}. Distinct from invalidation_policy (which controls supersession). |
recall_priority |
rpri |
string | Retrieval priority hint: "hot", "warm", "cold". Guides index layer storage tier selection. |
Note —
source_typefor Observation grains: Use"sensor"whenobserver_typeis a physical instrument;"agent_inferred"whenobserver_typeis a cognitive AI observer ("llm","reflector","classifier","detector");"user_explicit"for human observers.
Index-layer fields (§5.6, §28.3): The following fields in the table above are not stored in the immutable .mg blob. They are maintained by the store/index layer and are excluded from the content address and COSE signature:
superseded_by,system_valid_to,verification_status,access_count,last_accessed_at. Writers MUST NOT set these fields; see §28.3 for store update rules.
6.2 Event-Specific Fields
| Full Name | Short Key | Type | Notes |
|---|---|---|---|
content |
content |
string | Raw text of the event. MAY be omitted if content_blocks is present. |
consolidated |
consolidated |
bool | Whether this event has been distilled into Belief grains |
content_blocks |
cblocks |
array[map] | Typed content blocks for structured LLM messages. When present, takes precedence over flat content string. Each entry: {type: "text"/"image"/"tool_use"/"tool_result"/"thinking", ...}. See note below. |
model_id |
mdl |
string | LLM model identifier that produced the response (e.g., "claude-opus-4-6", "gpt-4o"). Absent for human-authored events. |
stop_reason |
stopr |
string | Why LLM generation stopped: "end_turn", "max_tokens", "stop_sequence", "tool_use". Open enum. |
token_usage |
toku |
map | Token consumption: {input_tokens: int, output_tokens: int, cache_creation_tokens: int, cache_read_tokens: int}. Enables cost tracking. |
parent_message_id |
pmid |
string | Content address of the preceding message grain in the conversation thread. Enables linked-list message threading and conversation branching (two Event grains sharing the same parent_message_id represent a branch point). |
Note —
content_blocksschema: Each block in the array MUST contain atypefield. Standard block types mirror the Anthropic Messages API:"text"({type, text}),"image"({type, source}),"tool_use"({type, id, name, input}),"tool_result"({type, tool_use_id, content, is_error}),"thinking"({type, thinking}). Implementations MAY define additional block types. Whencontent_blocksis present andcontentis also present,contentserves as a plain-text fallback for readers that do not support structured blocks.
6.3 State-Specific Fields
| Full Name | Short Key | Type |
|---|---|---|
plan |
plan |
array[string] |
history |
history |
array[map] |
6.4 Workflow-Specific Fields
| Full Name | Short Key | Type |
|---|---|---|
steps |
steps |
array[string] |
trigger |
trigger |
string |
6.5 Action-Specific Fields
| Full Name | Short Key | Type | Notes |
|---|---|---|---|
action_phase |
aphase |
string | Discriminator: "definition" | "call" | "result" | absent = complete |
tool_name |
tn |
string | |
input |
inp |
map | Canonical name for tool arguments (replaces arguments) |
content |
cnt |
any | Canonical name for tool result (replaces result) |
is_error |
iserr |
bool | Canonical error flag (replaces success) |
tool_call_id |
tcid |
string | Anthropic/MCP correlation ID; links result phase to call phase |
call_batch_id |
cbid |
string | Groups parallel calls issued in the same agent turn |
tool_type |
ttype |
string | "client" | "server" | "builtin" |
tool_version |
tver |
string | For versioned builtins, e.g. "web_search_20250305" |
execution_mode |
emode |
string | "function_call" | "code_exec" | "computer_use" |
code |
code |
string | Executable code for execution_mode: "code_exec" (CodeAct) |
stdout |
out |
string | Standard output from code execution |
stderr |
err2 |
string | Standard error from code execution |
exit_code |
xc |
int | Process exit code from code execution |
interpreter_id |
iid |
string | Links Action grains sharing a stateful interpreter session |
error |
err |
string | Error message (use with is_error: true) |
error_type |
etype |
string | Structured error classification: "timeout", "rate_limit", "auth_failure", "invalid_input", "server_error", "not_found", "quota_exceeded". Open enum. Enables retry policy decisions without parsing free-text error. |
duration_ms |
dur |
int | Execution time in milliseconds |
parent_task_id |
ptid |
string | Content address of parent task grain |
tool_description |
tdesc |
string | Human-readable description of the tool (definition phase) |
input_schema |
isch |
map | JSON Schema for tool inputs; mirrors Anthropic input_schema / MCP inputSchema (definition phase) |
output_schema |
osch |
map | JSON Schema (draft-07 compatible) describing the action's return value (definition phase) |
strict |
strict |
bool | If true, model guarantees strict JSON Schema conformance for input (definition phase) |
6.6 Observation-Specific Fields
| Full Name | Short Key | Type |
|---|---|---|
observer_id |
oid |
string |
observer_type |
otype |
string |
frame_id |
fid |
string |
sync_group |
sg |
string |
observation_mode |
omode |
string |
observation_scope |
oscope |
string |
observer_model |
omdl |
string |
compression_ratio |
ocmp |
float64 |
6.7 Goal-Specific Fields
| Full Name | Short Key | Type |
|---|---|---|
description |
desc |
string |
goal_state |
gs |
string |
criteria |
crit |
array[string] |
criteria_structured |
crs |
array[map] |
priority |
pri |
int |
parent_goals |
pgs |
array[string] |
state_reason |
sr |
string |
satisfaction_evidence |
se |
array[string] |
progress |
prog |
float64 |
delegate_to |
dto |
string |
delegate_from |
dfo |
string |
expiry_policy |
ep |
string |
recurrence |
rec |
string |
evidence_required |
evreq |
int |
rollback_on_failure |
rof |
array[string] |
allowed_transitions |
atr |
array[string] |
depends_on |
depg |
array[string] |
assigned_agent |
asgn |
string |
expected_output |
expout |
string |
output_grain |
outg |
string |
deadline |
dline |
int64 |
6.8 Consent-Specific Fields
Note:
subject_did(short keysdid) is a common field (§6.1) used here as the consenting party.grantee_didis Consent-specific.
| Full Name | Short Key | Type |
|---|---|---|
grantee_did |
gdid |
string |
scope |
scope |
array[string] |
is_withdrawal |
isw |
bool |
basis |
basis |
string |
jurisdiction |
jur |
string |
prior_consent |
pcon |
string |
witness_dids |
wdids |
array[string] |
6.9 Reasoning-Specific Fields
| Full Name | Short Key | Type |
|---|---|---|
premises |
prem |
array[string] |
conclusion |
conc |
string |
inference_method |
imethod |
string |
alternatives_considered |
altc |
array[map] |
thinking_content |
think |
string |
thinking_redacted |
tredact |
bool |
statistical_context |
statctx |
map |
software_environment |
swenv |
map |
parameter_set |
params |
map |
random_seed |
rseed |
int64 |
6.10 Consensus-Specific Fields
| Full Name | Short Key | Type |
|---|---|---|
participating_observers |
pobs |
array[string] |
threshold |
thold |
int |
agreement_count |
agcnt |
int |
dissent_count |
discnt |
int |
dissent_grains |
disgrn |
array[string] |
agreed_content |
agcon |
any |
6.11 Delegation-Specific Fields
When a Goal or Belief grain uses the mg:delegates_to relation, the following fields specify the scope and constraints of the delegation. Without these fields, a delegation is unbounded — the delegatee receives no machine-readable limits. Implementations SHOULD populate delegation scope fields for any inter-agent authority grant.
| Full Name | Short Key | Type | Notes |
|---|---|---|---|
authorized_namespaces |
ans |
array[string] | Namespaces the delegatee may read and write. ["*"] = all namespaces (dangerous — SHOULD be avoided). |
authorized_types |
atypes |
array[uint8] | Grain type bytes the delegatee may create. E.g., [0x01, 0x02, 0x05] for Belief, Event, Action. |
authorized_tools |
atools |
array[string] | Tool names the delegatee may invoke. Empty array = no tool restriction. |
delegation_depth |
ddepth |
int | Maximum re-delegation depth. 0 = delegatee MUST NOT re-delegate. Absent = unlimited (NOT RECOMMENDED). |
delegation_expiry |
dexp |
int64 | Epoch ms when delegation expires. After expiry, the delegatee's writes SHOULD be rejected by stores that enforce delegation scope. |
context_grains |
cgrains |
array[string] | Content addresses of grains to transfer as context to the delegatee. Enables session handoff: the delegator selects which grains the delegatee needs to continue. |
return_to |
retdid |
string | DID of the agent to return control to after the delegated task completes. |
6.12 Compaction Rules
- Serializers MUST replace full field names with short keys before encoding
- Deserializers MUST replace short keys with full field names after decoding
- Unknown keys (not in mapping) MUST be preserved as-is in both directions
- Field compaction mapping is normative and MUST NOT be modified by implementations
7. Multi-Modal Content References
Multi-modal content (images, audio, video, embeddings, sensor data) is referenced by URI, never embedded in grains.
7.1 Content Reference Schema
{
"uri": "cas://sha256:abc123...",
"modality": "image",
"mime_type": "image/jpeg",
"size_bytes": 1048576,
"checksum": "sha256:abc123...",
"metadata": {"width": 1920, "height": 1080}
}Field compaction for content_refs entries:
| Full Name | Short Key | Type | Required | Description |
|---|---|---|---|---|
uri |
u |
string | REQUIRED | Content URI |
modality |
m |
string | REQUIRED | Content type: image, audio, video, point_cloud, 3d_mesh, document, binary, embedding |
mime_type |
mt |
string | RECOMMENDED | Standard MIME type |
size_bytes |
sz |
int | OPTIONAL | File size in bytes |
checksum |
ck |
string | RECOMMENDED | SHA-256 hash for integrity |
metadata |
md |
map | OPTIONAL | Modality-specific metadata |
7.2 Embedding Reference Schema
{
"vector_id": "vec-12345",
"model": "text-embedding-3-large",
"dimensions": 3072,
"modality_source": "text",
"distance_metric": "cosine"
}Field compaction for embedding_refs entries:
| Full Name | Short Key | Type | Required | Description |
|---|---|---|---|---|
vector_id |
vi |
string | REQUIRED | ID in vector store |
model |
mo |
string | REQUIRED | Embedding model name |
dimensions |
dm |
int | REQUIRED | Vector dimensionality |
modality_source |
ms |
string | OPTIONAL | Source modality: "text", "image", "audio", etc. |
distance_metric |
di |
string | OPTIONAL | "cosine", "l2", "dot" |
chunk_index |
ci |
int | OPTIONAL | Position of this chunk within the source grain (0-indexed). When a grain is embedded as a single unit, chunk_index = 0. |
chunk_text |
ct |
string | OPTIONAL | The exact text that was embedded. Enables reconstruction from a vector search hit without re-reading and re-chunking the source grain. |
chunk_strategy |
cs |
string | OPTIONAL | Chunking method: "full" (entire grain), "sentence", "paragraph", "token_window", "recursive", "semantic". Open enum. |
chunk_overlap |
co |
int | OPTIONAL | Overlap in tokens between adjacent chunks. Absent or 0 for non-overlapping strategies. |
Note — RAG round-trip: When a vector search returns a hit, the
chunk_textfield enables immediate context assembly without a second read of the source grain. Thechunk_index+chunk_strategyfields enable re-chunking validation. Implementations that generate embeddings internally MUST populateembedding_refsentries on the grain. Implementations that delegate to an external vector store SHOULD populatechunk_textto ensure retrieval provenance is self-contained.
7.3 Modality-Specific Metadata
Image:
{"width": 1920, "height": 1080, "color_space": "sRGB"}Audio:
{"sample_rate_hz": 48000, "channels": 2, "duration_ms": 15000}Video:
{"width": 3840, "height": 2160, "fps": 30, "duration_ms": 120000, "codec": "h264"}Point Cloud:
{"point_count": 1234567, "format": "pcd_binary", "has_color": true}8. Grain Types
The type byte (Byte 2 of the fixed header) encodes the cognitive grain type — the class of knowledge unit this grain represents. Ten standard types are defined.
Standard mg: Relation Vocabulary
The mg: namespace is reserved for standard semantic relations. Applications define custom relations freely outside this namespace.
| Relation | Typical grain type | Meaning |
|---|---|---|
mg:perceives |
Observation | Raw sensory or cognitive input |
mg:knows |
Belief | Derived belief or learned fact |
mg:said |
Event | Message or utterance |
mg:did |
Action | Tool or action invocation |
mg:infers |
Reasoning | Derived conclusion from prior grains |
mg:agrees_with |
Consensus | Multi-agent threshold agreement |
mg:state_at |
State | Agent state snapshot |
mg:requires_steps |
Workflow | Learned action sequence |
mg:intends |
Goal | Agent objective |
mg:permits |
Consent | User grants agent right to retain or act |
mg:revokes |
Consent | User revokes prior consent |
mg:prohibits |
Belief/Goal | Hard prohibition |
mg:requires |
Belief/Goal | Hard requirement |
mg:prefers |
Belief | Soft preference |
mg:avoids |
Belief | Soft avoidance preference |
mg:delegates_to |
Goal | Scoped authority grant (§6.11 delegation scope) |
mg:owned_by |
Belief | Legal entity ownership (§12.5) |
mg:has_capability |
Belief | Agent capability advertisement (§28.5 Agent Card) |
mg:handed_off_to |
Event | Session handoff event record (§28.7) |
mg:depends_on |
Goal | Task dependency (distinct from parent_goals hierarchy) |
mg:assigned_to |
Goal | Task assigned to agent for execution |
8.1 Belief (type = 0x01)
A structured belief about the world — a (subject, relation, object) triple with confidence and source. The canonical unit of declarative knowledge.
Required fields:
type= "belief" (payload string; header byte =0x01)subject(non-empty string)relation(non-empty string)object(string or map)confidence(float64, [0.0, 1.0])created_at(int64, epoch ms)
Optional fields: All common fields from §6.1. Type-specific: temporal_type, success_count, failure_count, bi-temporal fields (valid_from, valid_to, system_valid_from, system_valid_to).
RDF mapping: <grain:subject> <grain:relation> "grain:object" .
8.2 Event (type = 0x02)
A raw, timestamped record of something that happened — a message, interaction, utterance, or behavioral occurrence.
Required fields:
type= "event"content(non-empty string) — raw text. MAY be omitted ifsubject/relation/objectfully describe the event.created_at(int64, epoch ms)
Optional fields: role ("user", "assistant", "system", "tool"), content_blocks (array[map] — structured multi-block content; takes precedence over flat content), model_id (string), stop_reason (string), token_usage (map), parent_message_id (string — content address of preceding message for conversation threading), consolidated (bool), run_id (string), session_id (string), all common fields.
8.3 State (type = 0x03)
An agent state snapshot — the portable save point at a moment in time.
Required fields:
type= "state"context(map) — agent state snapshot. For Letta-compatible agents, SHOULD includememory_blocks,system_prompt,tools,model.created_at(int64, epoch ms)
Optional fields: plan (array[string]), history (array[map]), all common fields.
8.4 Workflow (type = 0x04)
Learned action sequence — procedural memory for recurring tasks.
Required fields:
type= "workflow"steps(non-empty array[string]) — ordered action stepstrigger(non-empty string) — condition that activates this workflowcreated_at(int64, epoch ms)
Optional fields: All common fields.
8.5 Action (type = 0x05)
A record of a tool invocation, code execution, or computer-use action. See §27.1 for the full action_phase discriminator and field tables.
Required fields:
type= "action"- Phase-dependent required fields (see §27.1)
created_at(int64, epoch ms)
8.6 Observation (type = 0x06)
Raw sensory or cognitive input — what an observer perceived at a moment in time.
Required fields:
type= "observation"observer_id(non-empty string) — unique identifier of the observing entityobserver_type(non-empty string) — open enum, see §24created_at(int64, epoch ms)
Optional fields: observer_model, frame_id, sync_group, observation_mode, observation_scope, compression_ratio, all common fields.
8.7 Goal (type = 0x07)
An explicit objective with lifecycle semantics. Goals transition through states via the supersession chain.
Required fields:
type= "goal"description(non-empty string)goal_state(string enum) —"active","satisfied","failed","suspended"created_at(int64, epoch ms)
Optional fields: criteria, criteria_structured, priority, parent_goals, depends_on (array[string] — content addresses of prerequisite Goal grains that must complete before this one starts; distinct from parent_goals which implies decomposition, not dependency ordering), assigned_agent (string — DID of the agent assigned to execute this task), expected_output (string — description of expected output format), output_grain (string — content address of the grain containing the task's completed output), deadline (int64 — epoch ms hard deadline for task completion), state_reason, satisfaction_evidence, progress, delegate_to, delegate_from, expiry_policy, recurrence, evidence_required, rollback_on_failure, allowed_transitions, all common fields.
Constraints, policies, and delegations are expressed as Goal or Belief grains with mg:prohibits, mg:prefers, mg:avoids, or mg:delegates_to relations, combined with invalidation_policy (§23) for enforcement.
Note — plan-and-execute agents: The
depends_onfield enables DAG-structured task dependency graphs for hierarchical task decomposition. Agents using plan-and-execute patterns (e.g., LangGraph StateGraph, CrewAI task dependencies) SHOULD express task ordering viadepends_onand task hierarchy viaparent_goals. A Goal grain withdepends_onreferences MUST NOT transition togoal_state: "active"until all referenced Goal grains havegoal_state: "satisfied". Theassigned_agentfield enables multi-agent task routing: the orchestrator creates Goal grains withassigned_agentpointing to worker agent DIDs.
8.8 Reasoning (type = 0x08)
An inference step or thought chain — what the agent considered, concluded, and rejected. Enables audit trails for high-stakes decisions.
Required fields:
type= "reasoning"created_at(int64, epoch ms)
Optional fields:
premises(array[string]) — content addresses of grains that informed this reasoningconclusion(string) — the conclusion reachedinference_method(string) —"deductive","inductive","abductive","analogical"alternatives_considered(array[map]) — rejected hypotheses, each:{hypothesis: string, rejection_reason: string}thinking_content(string) — raw thinking/reasoning trace from the LLM's extended thinking feature (e.g., Anthropicthinkingblocks). Distinct fromconclusion(the output) andpremises(the inputs). This is the primary audit artifact.thinking_redacted(bool) — iftrue, the LLM's thinking was present but redacted before storage (e.g., for compliance or IP protection). Thethinking_contentfield will be absent or contain a placeholder.requires_human_review(bool) — iftrue, MUST NOT drive automated decisions until clearedstatistical_context(map) —{p_value: float, confidence_interval: [float, float], effect_size: float, sample_size: int}software_environment(map) —{language: string, runtime_version: string, library_versions: map, os: string}parameter_set(map) — model parameters or hyperparameters usedrandom_seed(int64) — for reproducibility- All common fields from §6.1
8.9 Consensus (type = 0x09)
A multi-agent agreement record — N observers voted on a shared claim, threshold was met (or not).
Required fields:
type= "consensus"participating_observers(array[string]) — DIDs of agents that contributed votesthreshold(int) — minimum agreement count requiredagreement_count(int) — actual agreement countdissent_count(int) — disagreement countcreated_at(int64, epoch ms)
Optional fields:
dissent_grains(array[string]) — content addresses of minority-opinion grainsagreed_content(string or map) — the consensus claim- All common fields from §6.1
8.10 Consent (type = 0x0A)
A DID-scoped, purpose-bounded permission grant or withdrawal. Four of six industry review domains independently required a dedicated Consent type at the type-byte level — HIPAA patient consent, legal privilege and DPA, regulatory consent, and GDPR/CCPA at scale. The Belief + mg:permits pattern is semantically correct but impractical when consent queries are compliance-critical and frequent.
Required fields:
type= "consent"subject_did(string) — DID of the consenting partygrantee_did(string) — DID of the party receiving permissionscope(array[string]) — operations consented to. Standard values:"store","retrieve","share","process","infer","train","profile". Open enum.is_withdrawal(bool) —trueif revoking a prior consentcreated_at(int64, epoch ms)
Optional fields:
valid_from,valid_to(int64, epoch ms) — consent windowbasis(string) —"explicit_consent","legitimate_interest","contract","legal_obligation". Open enum.jurisdiction(string) —"eu","us_ccpa","us_hipaa","br_lgpd". Open enum.prior_consent(string) — content address of the Consent grain being superseded (REQUIRED whenis_withdrawal: true)witness_dids(array[string]) — DIDs of witness agents
Normative rules:
- A Consent grain with
is_withdrawal: trueMUST referenceprior_consent. - Stores MUST honor consent withdrawal immediately. Consent grains MUST NOT be subject to automatic forgetting or retention decay.
- A withdrawn Consent grain is NOT deleted — both grant and withdrawal are retained for audit.
- Default
invalidation_policy.modefor Consent grains is"soft_locked". - The
processing_basiscommon field (§6.1) on any grain carries the content address of the Consent grain that authorized its creation — enabling GDPR Art. 17 erasure cascade.
9. Cryptographic Signing
9.1 COSE Sign1 Envelope
For A2A sharing and audit compliance, grains MAY be wrapped in COSE Sign1 (RFC 9052) envelopes.
Signed Grain Structure:
COSE_Sign1 {
protected: {
1: -8, // alg: EdDSA (see note below)
4: "did:key:z6MkhaXg..." // kid: signer DID
3: "application/vnd.mg+msgpack" // content_type
},
unprotected: {
"iat": 1737000000 // timestamp: epoch seconds
},
payload: <.mg blob bytes>,
signature: <Ed25519 signature, 64 bytes>
}
Key points:
- Signature wraps the complete .mg blob (version byte + optional header + payload)
- Content address is still the inner blob's SHA-256 hash (unchanged by signing)
- EdDSA (Ed25519) is default algorithm; ES256 (ECDSA P-256) is alternative
- Signing is optional;
signedflag in header indicates presence - Signer identity is the DID in
kid(Key ID) field
Note on EdDSA algorithm value: This specification uses COSE algorithm value
-8(EdDSA). The IANA COSE Algorithms registry has introduced more specific values:-19for Ed25519 and-53for Ed448. Implementations MAY use-19instead of-8when Ed25519 is the only supported curve. Verifiers MUST accept both-8and-19for Ed25519 signatures.
9.2 Signed Flag and Wrapper Consistency
The signed flag (byte 1, bit 0) is part of the inner blob's fixed header. The COSE_Sign1 wrapper is external to the content-addressed blob and is NOT included in the SHA-256 hash:
[Inner .mg blob] [Outer COSE_Sign1 — not content-addressed]
├─ Byte 1, bit 0: signed = 1 ├─ protected headers
├─ payload bytes ├─ unprotected headers
└─ content address = SHA-256(blob) └─ signature over inner blob bytes
Invariant: The signed flag MUST match the presence of an outer COSE wrapper:
- If
signed= 1, the grain MUST be delivered wrapped in COSE_Sign1 - If
signed= 0, the grain MUST NOT be wrapped
Parsers MUST reject with ERR_SIGNED_MISMATCH if the flag is 1 but no wrapper is present, or the flag is 0 but a wrapper is present.
Content address stability: Signing does not change the inner blob bytes or its content address. An unsigned and a signed delivery of the same grain share the same content address.
9.3 Identity Verification
To verify a signed grain:
- Parse COSE_Sign1 structure
- Extract
kid(signer DID) from protected headers - Resolve DID to public key (did:key self-contained, did:web via HTTPS)
- Verify signature over the payload
- Deserialize payload to verify content address matches
10. Selective Disclosure
Grains MAY use field-level selective disclosure (inspired by SD-JWT RFC 9901) to hide sensitive fields while proving they exist.
10.1 Elision Model
When sharing a grain with restricted visibility:
- Full grain (held by creator):
{
"type": "fact",
"subject": "Alice",
"relation": "works_at",
"object": "ACME Corp",
"user_id": "alice-123",
"namespace": "hr",
"created_at": 1737000000000
}- Disclosed grain (shared with receiver):
{
"type": "fact",
"subject": "Alice",
"relation": "works_at",
"object": "ACME Corp",
"created_at": 1737000000000,
"_elided": {
"user_id": "sha256:a1b2c3d4...",
"namespace": "sha256:e5f6a7b8...",
},
"_disclosure_of": "sha256:original_grain_hash..."
}10.1.1 Elision Hash Computation
The value stored in _elided for each elided field is the SHA-256 hash of the canonical MessagePack encoding of that field's value:
elision_hash = "sha256:" + lowercase_hex(SHA-256(canonical_msgpack_encode(field_value)))
The hash covers the value bytes only — the field name (key) is not included. The field value is serialized using the same canonical MessagePack rules as the full grain (Section 4): NFC-normalized strings, sorted map keys, omitted nulls, float64, etc.
Examples:
user_id = "alice-123": encode"alice-123"as MessagePack fixstr → SHA-256 the resulting bytesconfidence = 0.95: encode0.95as float64 (9 bytes) → SHA-256 the resulting bytescontext = {"k": "v"}: encode as canonical sorted map → SHA-256 the resulting bytes
Verification: A receiver holding the disclosed grain can verify that a declared-absent field was faithfully elided by encoding the revealed value and comparing its SHA-256 against the entry in _elided.
10.2 Field Elision Rules
| Field | Elidable | Reason |
|---|---|---|
type |
No | Receiver must know grain type |
subject |
Yes | May contain PII |
relation |
No | Core knowledge structure |
object |
Yes | May contain PII |
confidence |
No | Essential for trust decisions |
user_id |
Yes | GDPR personal data |
namespace |
Yes | May reveal organizational structure |
created_at |
No | Essential for temporal queries |
provenance_chain |
Yes | May reveal system architecture |
context |
Yes | May contain sensitive details |
structural_tags |
Yes | May reveal classification system |
goal_state |
No | Essential for routing and trust decisions |
source_type |
No | Required for human-vs-agent trust decisions |
priority |
No | Required for cross-system scheduling |
description |
Yes | May reveal strategic intent |
criteria |
Yes | May reveal operational thresholds |
criteria_structured |
Yes | May reveal operational thresholds |
parent_goals |
Yes | May reveal goal hierarchy (system architecture) |
state_reason |
Yes | May reveal internal reasoning |
satisfaction_evidence |
Yes | May reveal system internals |
delegate_to |
Yes | May reveal agent architecture |
delegate_from |
Yes | May reveal agent architecture |
rollback_on_failure |
Yes | May reveal system control flow |
observer_id |
Yes | May reveal physical sensor topology or agent infrastructure identity |
observer_type |
No | Core routing and trust-domain field; receiver must know observer category to calibrate confidence |
observer_model |
Yes | May reveal internal AI stack or model versioning |
observation_mode |
No | Required for trust calibration; changes the interpretation of confidence |
observation_scope |
No | Required for temporal interpretation of valid_from/valid_to |
compression_ratio |
No | Required for confidence calibration; cannot assess fidelity without knowing compression factor |
frame_id |
Yes | May reveal spatial coordinate topology or internal contextual system architecture |
sync_group |
Yes | May reveal multi-sensor or multi-agent coordination topology |
10.3 Elision in .mg Format
Field compaction:
| Full Name | Short Key | Type |
|---|---|---|
_elided |
_e |
map {string: string} |
_disclosure_of |
_do |
string |
Disclosed grain has different content address than original (bytes changed). If COSE-signed, signature covers original grain; receiver can verify all non-elided fields are authentic.
10.4 Canonical Form and Disclosure
The original (undisclosed) grain is the canonical form. Selective disclosure produces a derived view with a different content address; it does not create a new canonical grain.
- Original grain: content address is the hash of the complete, unelided blob — this is the authoritative identity
- Disclosed grain: content address is the hash of the elided blob — different from the original's address;
_disclosure_oflinks back to the original's content address - COSE signatures wrap and cover the original blob. Receivers verify the signature against the original's content address, not the disclosed variant's
In distributed systems:
- Primary storage holds the original grain (canonical, fully populated)
- Disclosed variants are presentation artifacts generated on demand; they SHOULD NOT be stored as independent grains
- When
_disclosure_ofresolves to an address in the store, the authoritative content is the original grain at that address
Rationale: Treating the original as canonical preserves the immutability guarantee (original is a fixed point) while allowing dynamic, per-recipient selective disclosure without re-signing or rehashing.
11. File Format (.mg files)
11.1 Purpose
The .mg file is the portable unit of memory. Individual grains live in blob storage by content hash; .mg files are what users see, copy, share, and archive.
Mental model:
.sqlite = database file (many rows)
.git = repository (many objects)
.mg = memory file (many grains)
11.2 Layout
.mg File Structure:
+----------+------------------+
| Header | Magic: "MG\x01" | 3 bytes
| | Flags: uint8 | 1 byte
| | Grain count: u32 | 4 bytes
| | Field map ver: u8| 1 byte
| | Compression: u8 | 1 byte
| | Reserved: 6 bytes| 6 bytes
+----------+------------------+ = 16 bytes
| Index | Grain offsets | 4 bytes × grain_count (u32 each)
| | (enables random access)
+----------+------------------+
| Grains | grain 0 bytes | variable
| | grain 1 bytes | variable
| | ... |
| | grain N-1 bytes | variable
+----------+------------------+
| Manifest | Index manifest | variable (canonical MessagePack/CBOR)
| (opt.) | (if flags bit 4) | see §11.7
+----------+------------------+
| Footer | SHA-256 checksum | 32 bytes (over header + index + grains + manifest)
+----------+------------------+
11.3 Header Fields
Magic: 0x4D 0x47 0x01 — "MG" + version 1
Flags (uint8):
| Bit | Meaning |
|---|---|
| 0 | sorted — grains are sorted by created_at (ascending) |
| 1 | deduplicated — no duplicate content addresses |
| 2 | compressed — grain region is zstd-compressed (single block) |
| 3 | field_map_included — file includes custom FIELD_MAP for app-defined fields |
| 4 | has_index_manifest — file includes an index manifest section (§11.7) |
| 5-7 | Reserved |
Compression codec (uint8):
| Value | Codec |
|---|---|
| 0x00 | None (uncompressed) |
| 0x01 | zstd (default, level 3) |
| 0x02 | lz4 (low-latency) |
| 0x03-0xFF | Reserved |
11.4 Random Access via Offsets
The offset index (4 bytes × grain count) enables fast random access:
# Read grain #42 from a .mg file
header_size = 16
offset_start = header_size + (42 * 4)
offset = int.from_bytes(data[offset_start:offset_start+4], 'big')
next_offset = int.from_bytes(data[offset_start+4:offset_start+8], 'big')
grain_bytes = data[offset:next_offset]For compressed files (flags bit 2 = 1), offsets point into the decompressed grain region. The entire grain region MUST be fully decompressed before any grain can be accessed by offset; implementations MUST NOT attempt to index into the compressed byte stream directly. This is a deliberate trade-off: compression reduces file size at the cost of requiring full decompression before random access.
11.5 Footer Checksum
SHA-256 over: header (16 bytes) || index (grain_count*4 bytes) || grains (variable) || manifest (variable, if present)
Enables integrity verification of entire file.
11.6 Wire Framing (Transport Layer)
For streaming scenarios (WebSocket, SSE, Kafka, TCP), use length-prefixed framing (NOT saved to disk):
+------+------------------+
| u32 | grain 0 bytes | length-prefixed frame
+------+------------------+
| u32 | grain 1 bytes | length-prefixed frame
+------+------------------+
| 0x00000000 | zero-length sentinel = end of stream
+------+------------------+
11.7 Index Manifest (Portable Index-Layer State)
When flag bit 4 (has_index_manifest) is set, the .mg file includes an index manifest section between the grain region and the footer. The manifest carries index-layer field values (§5.6, §28.3) so that a single .mg file is a self-contained, portable unit of memory — including lifecycle state, not just immutable content.
Format: The manifest is a canonical MessagePack (or CBOR, matching the grains' encoding) map keyed by content address:
{
"<content_address>": {
"sb": "<superseding content address>",
"svt": 1737000000000,
"vstatus": "verified"
},
"<content_address>": {
"vstatus": "contested",
"ac": 42,
"laa": 1737500000000
}
}
Field names use the compacted short keys from §6.1. Null/absent values are omitted per §4. Only grains with at least one non-default index-layer field need an entry.
Field portability classes:
| Class | Fields | Export | Import |
|---|---|---|---|
| Portable | superseded_by, system_valid_to, verification_status |
MUST include | MUST merge into index |
| Local | access_count, last_accessed_at |
MAY include | MAY merge or reset to zero |
Portable fields carry semantic state (supersession chains, verification decisions) that is meaningful across systems. Local fields carry store-specific access statistics that may not be meaningful in a different deployment.
Export rules:
- Exporters MUST set flag bit 4 and include a manifest when any grain in the file has non-default portable index-layer fields.
- Exporters SHOULD include local fields as a convenience; omitting them is not an error.
Import rules:
- Importers MUST parse the manifest when flag bit 4 is set.
- Importers MUST apply portable fields to their index layer. If a grain already exists in the target store with conflicting index state, the conflict resolution strategy is implementation-defined (last-writer-wins, manual review, etc.).
- Importers MAY ignore local fields or reset them to defaults (e.g.,
access_count: 0). - Importers MUST NOT inject manifest fields into the immutable blob. The manifest is index-layer metadata only.
Integrity: The manifest bytes are included in the footer checksum (§11.5) but are NOT part of any grain's content address. Tampering with the manifest is detectable via the footer checksum, but the immutable grain blobs remain independently verifiable by their own content addresses.
Implementation note: For .mg files without flag bit 4, importers SHOULD initialize all index-layer fields to defaults (
verification_status: "unverified",access_count: 0, etc.). The absence of a manifest means either the exporter predates this feature or all grains had default index-layer state.
12. Identity and Authorization
12.1 DID-Based Identity (author_did)
Replaces the earlier agent_id string (free-form, unverifiable):
author_did(compacted:adid) — DID of grain creator (cryptographically verifiable)origin_did(compacted:odid) — original source DID in A2A relay chains
12.2 Why W3C DIDs
W3C DIDs provide decentralized identity without central PKI:
-
did:key (default) — Self-contained; public key in the DID itself
did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK -
did:web (enterprise) — Organizational identity via DNS
did:web:example.com:agents:summarizer
12.3 Identity Fields (Orthogonal)
| Field | Purpose | Example | Used By |
|---|---|---|---|
author_did |
Agent identity — who created this grain | did:key:z6Mk... |
COSE signature verification, audit trail |
user_id |
Data subject — whose personal data | "alice-42", "patient-789" |
GDPR erasure, per-user encryption |
namespace |
Logical partition — grouping | "work", "robotics:arm-7" |
Query scoping, access control |
12.4 User ID Compliance Context
user_id is specifically for natural persons under GDPR, CCPA, HIPAA:
- Triggers per-person encryption (HKDF key derivation)
- Enables erasure proofs (crypto-erasure by destroying key)
- Tracks per-person consent
- Enables blind index lookups (HMAC tokens) without exposing plaintext
For non-person memory (seasonal, device, system), user_id is simply omitted. namespace handles logical grouping.
12.5 Agent Ownership and Legal Entity
An agent may belong to a legal entity — a natural person or a juridical person (company, partnership, NGO, government body). OMS expresses this relationship as a protected Belief grain written at agent provisioning time by the operator, not by the agent itself.
12.5.1 The owner Field
Any grain type MAY carry an owner field (compacted: own) containing a LegalEntity map. In practice, owner is used in the ownership Belief grain described in §12.5.3. It MUST NOT be used as an access control gate — invalidation_policy (§23) governs supersession authorization.
LegalEntity sub-schema:
| Field | Type | Required | Description |
|---|---|---|---|
type |
string | REQUIRED | "human" (natural person) or "org" (juridical entity) |
name |
string | REQUIRED | Registered legal name |
entity_form |
string | OPTIONAL | Legal structure (open enum; see §12.5.2). Omit when type: "human". |
jurisdiction |
string | OPTIONAL | ISO 3166-2 code of registration jurisdiction (e.g., "US-DE", "IN-KA", "GB", "SG") |
reg_id |
string | OPTIONAL | Government registration ID, prefixed by type (e.g., "EIN:88-...", "CIN:U...", "ABN:51...") |
did |
string | OPTIONAL | W3C DID for cryptographic verifiability. RECOMMENDED when available. |
12.5.2 entity_form Registry (Open Enum)
| Value | Legal structure |
|---|---|
"c_corp" |
C-Corporation (US) |
"s_corp" |
S-Corporation (US) |
"pbc" |
Public Benefit Corporation (US) |
"llc" |
Limited Liability Company (US) |
"llp" |
Limited Liability Partnership (US / India / UK) |
"pvt_ltd" |
Private Limited Company (India: Pvt. Ltd.; UK: Ltd.) |
"plc" |
Public Limited Company (UK) |
"gmbh" |
Gesellschaft mit beschränkter Haftung (DE / AT / CH) |
"sarl" |
Société à responsabilité limitée (FR and Francophone jurisdictions) |
"bv" |
Besloten vennootschap (NL / BE) |
"pty_ltd" |
Proprietary Limited (AU / ZA) |
"sole_proprietor" |
Sole proprietorship (any jurisdiction) |
"partnership" |
General partnership |
"ngo" |
Non-governmental organization / 501(c)(3) |
"government" |
Government body or public agency |
"trust" |
Trust entity |
"cooperative" |
Cooperative |
This is an open enum. Implementations MAY define additional values for jurisdiction-specific structures not listed above.
reg_id prefix conventions:
| Prefix | Country | ID type |
|---|---|---|
EIN: |
US | Employer Identification Number |
CIN: |
India | Company Identification Number (MCA) |
GSTIN: |
India | GST Identification Number |
ABN: |
Australia | Australian Business Number |
VAT: |
EU / UK | VAT registration number |
UEN: |
Singapore | Unique Entity Number |
SIREN: |
France | Système d'Identification du Répertoire des Entreprises |
Prefixes not listed here MUST be preserved as-is. New prefixes do not require a spec update.
12.5.3 Ownership Belief Grain Convention
Agent ownership is expressed as a Belief grain with relation: "mg:owned_by" in the "agent:identity" namespace. The object field carries the owner's legal name as a string (for semantic triple completeness). The structured owner field carries the full LegalEntity map.
This grain MUST be written by the operator at agent provisioning time. It MUST carry an invalidation_policy (§23) restricting supersession to the owner's authorized DID. It SHOULD be COSE-signed (§9) by the owner's DID.
Example — organization owner (Indian Pvt. Ltd.):
{
"type": "belief",
"subject": "did:web:example.com:agents:my-agent",
"relation": "mg:owned_by",
"object": "Example Corp Pvt. Ltd.",
"owner": {
"type": "org",
"name": "Example Corp Pvt. Ltd.",
"entity_form": "pvt_ltd",
"jurisdiction": "IN-KA",
"reg_id": "CIN:U72900KA2023PTC123456",
"did": "did:web:example.com"
},
"source_type": "system",
"author_did": "did:web:example.com",
"namespace": "agent:identity",
"structural_tags": ["legal:ownership", "mg:protected"],
"invalidation_policy": {
"mode": "locked",
"authorized": ["did:web:example.com"],
"scope": "lineage",
"protection_reason": "Immutable ownership declaration — change requires authorized officer signature"
},
"created_at": 1737000000000
}Example — individual human owner:
{
"type": "belief",
"subject": "did:key:z6MkAgentDID...",
"relation": "mg:owned_by",
"object": "Jane Doe",
"owner": {
"type": "human",
"name": "Jane Doe",
"jurisdiction": "IN",
"did": "did:key:z6MkJaneDoeKey..."
},
"source_type": "system",
"author_did": "did:key:z6MkJaneDoeKey...",
"namespace": "agent:identity",
"structural_tags": ["legal:ownership", "mg:protected"],
"invalidation_policy": {
"mode": "locked",
"authorized": ["did:key:z6MkJaneDoeKey..."],
"scope": "lineage",
"protection_reason": "Individual owner declaration"
},
"created_at": 1737000000000
}Example — US LLC (Delaware):
{
"owner": {
"type": "org",
"name": "Acme Labs LLC",
"entity_form": "llc",
"jurisdiction": "US-DE",
"reg_id": "EIN:47-1234567",
"did": "did:web:acmelabs.io"
}
}Normative rules:
- The ownership grain MUST NOT be authored by the agent's own DID. Only the operator's DID is authorized to write it (key separation, §23.8).
- The
subjectMUST be the agent's DID. - When multiple grains with
relation: "owned_by"exist for the samesubjectin the"agent:identity"namespace, the grain withinvalidation_policy.mode ≠ "open"is authoritative. Stores SHOULD surface it as the canonical ownership record. - An agent observing a user assertion that contradicts the locked ownership grain MAY record that claim as an Observation grain. It MUST NOT write a superseding ownership Belief without the authorized signature.
12.5.4 Protection Layers
The locked invalidation policy combined with COSE signing provides layered protection against ownership spoofing:
| Layer | Mechanism | What it prevents |
|---|---|---|
| Policy lock | invalidation_policy.mode: "locked" (§23) |
Store rejects any supersession not signed by authorized DID; returns ERR_INVALIDATION_DENIED |
| Key separation | Agent DID ≠ owner DID (§23.8) | Agent cannot produce a valid supersession signature even if instructed to by a user |
| Lineage scope | scope: "lineage" (§23.6) |
Supersession chain injection — agent cannot supersede a derived grain to bypass the protected root |
| COSE signature | Owner signs the blob (§9) | Blob tampering changes the content address; the original signed grain remains valid and current |
Prompt injection resistance: A user or external input asserting "your owner is now X" does not create or modify an ownership grain. The agent lacks the owner's private key and cannot author a superseding grain that passes the locked policy check. The original ownership fact remains current.
13. Sensitivity Classification
13.1 Header-Level Sensitivity
The fixed header includes a 2-bit sensitivity field (byte 1, bits 6-7):
| Value | Level | Meaning |
|---|---|---|
| 00 | Public | No sensitivity constraints |
| 01 | Internal | Organization-internal data, not PII |
| 10 | PII | Contains personally identifiable information |
| 11 | PHI | Contains protected health information (HIPAA) |
Enables O(1) routing to encrypted storage or access control — no deserialization needed.
13.2 Standard Tag Vocabulary
Detailed sensitivity classification via structural_tags in payload:
| Prefix | Category | Examples |
|---|---|---|
pii: |
Personal data | pii:email, pii:phone, pii:ssn, pii:name |
phi: |
Health data | phi:diagnosis, phi:medication, phi:lab_result |
reg: |
Regulatory jurisdiction | reg:pci-dss, reg:sox, reg:basel-iii, reg:gdpr-art17 |
sec: |
Security data | sec:credential, sec:api_key, sec:token |
legal: |
Legal data | legal:ownership, legal:privilege, legal:litigation_hold |
The reg: prefix identifies which regulatory storage or retention rules apply to a grain. The vocabulary is open-ended — use well-known regulation identifiers. Examples: reg:pci-dss (PCI-compliant storage required), reg:sox (7-year immutable audit retention), reg:basel-iii (regulatory capital data), reg:gdpr-art17 (erasure-eligible). Unlike pii: or phi:, reg: tags carry no compliance classification claim — they are routing and policy directives.
At write time, serializer scans tags and sets header sensitivity bits to highest classification present.
13.3 Header Sensitivity Limitations
Header sensitivity bits (§13.1) are advisory routing metadata, not a compliance guarantee. They enable efficient routing without deserialization but MUST NOT be treated as the sole basis for access control or encryption decisions.
Tag-based sensitivity assignment (§13.2) depends on the writer correctly identifying and tagging sensitive fields at creation time. If a grain contains sensitive data but is incorrectly or incompletely tagged, the header bits will not reflect the true classification.
Systems processing personal data, health information, or other regulated content SHOULD:
- Treat header sensitivity bits as a fast-path routing hint, not a classification guarantee
- Perform payload inspection for sensitive decisions — deserialize and validate
structural_tagsbefore routing or sharing - Enforce writer responsibility — establish clear tagging protocols for regulated workflows
- Apply layered defense — combine header-level filtering with payload inspection; never gate compliance solely on header bits
13.4 Sensitivity Consistency Validation
Serializer rule: At write time, the serializer MUST scan all structural_tags values and set the header sensitivity bits to the highest classification present, using this mapping:
| Tag prefix present | Minimum header sensitivity |
|---|---|
phi:* |
11 (PHI) |
pii:*, sec:*, legal:* |
10 (PII) |
reg:* |
01 (internal) minimum — policy engine determines actual tier |
| No sensitive tags | 00 or 01 at writer's discretion |
Parser rule: At parse time, if structural_tags is present, the parser MUST validate that the header sensitivity bits are not lower than the highest classification the tags require. If they are lower, the parser MUST reject with ERR_SENSITIVITY_MISMATCH. This condition indicates either a serializer defect or potential header tampering to bypass access controls.
13.5 Legal Neutrality Statement
The sensitivity classifications in this specification (public, internal, PII, PHI) are technical routing and storage metadata. They are not legal definitions of personal data, health information, financial information, or any regulated category under any jurisdiction.
Different legal regimes use different terminology and thresholds:
- GDPR (EU) — "personal data": any information relating to an identified or identifiable natural person
- CCPA (California) — "personal information": information that identifies or could reasonably be linked to a consumer
- LGPD (Brazil) — "dados pessoais": similar scope to GDPR
- HIPAA (USA) — "protected health information (PHI)": a specific regulatory category under 45 CFR
Implementations MUST determine sensitivity classification according to applicable jurisdictional law and organizational policy. The .mg tags and header bits are provided as a compliance-aware tagging mechanism to facilitate routing and policy enforcement; the legal determination of what constitutes regulated data is outside the scope of this specification.
14. Cross-Links and Provenance
14.1 Provenance Chain
Every grain carries provenance_chain — the derivation trail:
{
"provenance_chain": [
{"source_hash": "abc123...", "method": "user_input", "weight": 1.0},
{"source_hash": "def456...", "method": "frequency_consolidation", "weight": 0.8}
]
}Each entry has:
source_hash— content address of source grainmethod— consolidation method or source typeweight— how much this source contributed (0.0–1.0)
Provenance chain method strings for Observation grains:
| Method String | Meaning |
|---|---|
"sensor_read" |
Direct physical measurement from an instrument |
"llm_observation" |
LLM-generated observation from input messages or documents |
"reflective_compression" |
Observation produced by compressing prior Observation or Episode grains |
"multi_sensor_fusion" |
Observation produced by fusing multiple physical sensor readings sharing a sync_group |
"human_annotation" |
Observation recorded by a human observer or annotator |
"detection_inference" |
Observation produced by a classification or detection model |
14.2 Related-To Cross-Links
The related_to field enables semantic similarity links:
{
"related_to": [
{
"hash": "abc123...",
"relation_type": "similar",
"weight": 0.85
},
{
"hash": "def456...",
"relation_type": "elaborates",
"weight": 0.70
}
]
}Field compaction (RELATED_TO_FIELD_MAP):
| Full Name | Short Key | Type |
|---|---|---|
hash |
h |
string |
relation_type |
rl |
string |
weight |
w |
float64 |
14.3 Relation Type Registry (Closed Vocabulary)
The relation type vocabulary is intentionally closed (not extensible) to prevent PII leakage through relation names:
| Type | Meaning | Direction |
|---|---|---|
similar |
Semantically similar content | Symmetric |
contradicts |
Incompatible claims | Symmetric |
elaborates |
Adds detail/specificity | Asymmetric |
generalizes |
More abstract version | Asymmetric |
temporal_next |
Event occurs after | Asymmetric |
temporal_prev |
Event occurs before | Asymmetric |
causal |
Causes or preconditions | Asymmetric |
supports |
Provides corroborating evidence | Asymmetric |
refutes |
Provides contradicting evidence (weaker than contradicts) | Asymmetric |
replaces |
Supersedes (outdated but not wrong) — advisory only | Asymmetric |
depends_on |
Validity depends on referenced grain | Asymmetric |
Normative note on
replaces: Thereplacesrelation type is a semantic annotation only. It does NOT constitute formal supersession and MUST NOT cause a conformant store to update the target grain's index entry (superseded_by,contradicted,system_valid_to). Conformant clients MUST determine a grain's current status solely from the indexsuperseded_byandcontradictedfields, never fromrelated_tolinks. This rule closes a bypass path forinvalidation_policy(see §23.7).
15. Temporal Modeling
15.1 Five Timestamps Per Grain
| Field | Meaning | Real-World Reference | System Reference |
|---|---|---|---|
valid_from |
When fact became true | Event start time | — |
valid_to |
When fact stopped being true | Event end time | — |
created_at |
When grain was added to system | Ingestion timestamp | System write time |
system_valid_from |
When grain became active in system | — | System validity start (blob field) |
system_valid_to |
When grain was superseded/retracted | — | System validity end (index layer) |
15.2 Bi-Temporal Queries
With these five fields, systems support:
| Query | Fields Used |
|---|---|
| "What does agent know now?" | system_valid_to is null/absent |
| "What was true on date X?" | valid_from ≤ X ≤ valid_to |
| "What did agent know at time T?" | system_valid_from ≤ T AND (system_valid_to is null OR system_valid_to > T) |
| "Reconstruct state at audit time T" | Combine event-time and system-time |
15.3 Implementation Note
system_valid_to is typically an index-layer field, not stored in immutable .mg blobs. The index adds this field when supersession occurs. The .mg blob itself carries system_valid_from at creation; the index tracks the end time.
16. Encoding Options
16.1 MessagePack (Default)
MessagePack is the default encoding. Well-supported across 50+ languages, compact, and human-debuggable with tools.
Canonical MessagePack rules (Section 4) ensure deterministic encoding.
16.2 CBOR (Optional)
CBOR (RFC 8949) is an optional encoding, specified via flags bit 5. Uses Deterministic CBOR (RFC 8949 §4.2.1) rules:
- Map keys sorted by encoded form (lexicographic on CBOR bytes)
- Integers in smallest encoding
- No indefinite-length values
- Single NaN representation
- Shortest floating-point form that preserves value (e.g.,
1.5→ binary160xf93e00; does NOT convert floats to integers) - Strings are UTF-8 NFC-normalized
- No duplicate keys
Critical: Same grain encoded as MessagePack and CBOR have DIFFERENT content addresses (different bytes). Logical equivalence ≠ physical equivalence.
16.3 When to Use
- MessagePack (default): Universal, mature, fast
- CBOR: IETF standards track, COSE signatures, constrained devices
17. Conformance Levels
Implementations MUST declare which level they support:
17.1 Level 1: Minimal Reader
- Deserialize version byte + canonical MessagePack payload
- Compute and verify SHA-256 content addresses
- Support field compaction (short keys → full names)
- Support all ten standard grain types (0x01–0x0A) per §8 schemas
- Ignore unknown fields
- Constant-time hash comparison
Level 1 is sufficient for reading, verifying, and storing grains.
17.2 Level 2: Full Implementation
All Level 1 requirements, plus:
- Serialize (full names → short keys)
- Enforce canonical MessagePack rules
- Validate required fields per schema
- Pass all test vectors
- Support multi-modal content references
- Implement Store protocol (get/put/delete/list/exists)
- Enforce
invalidation_policyon all supersession and contradiction operations - Implement
supersedeas a distinct, atomic store operation (not a rawput+ index patch);putMUST reject grains containingderived_fromclaims that imply supersession without going throughsupersede - Apply fail-closed rule: unknown
invalidation_policy.modevalues MUST be treated asmode: "locked" - Enforce the
replacesnon-supersession rule:relation_type: "replaces"MUST NOT trigger index mutations on the target grain - MUST validate that
observer_typeis a non-empty string; MUST NOT reject unknownobserver_typevalues (open enum) - MUST emit
oidandotypeshort keys - SHOULD warn (but MUST NOT reject) when
observer_modelis absent on Observation grains whereobserver_typeis"llm","reflector","classifier", or"detector"
17.3 Level 3: Production Store
All Level 2 requirements, plus:
- At least one persistent backend (filesystem, S3, database)
- AES-256-GCM encrypted grain envelopes
- Per-user key derivation (HKDF-SHA256)
- Blind-index tokens for encrypted search
- SPO/SOP/PSO/POS/OPS/OSP index (hexastore) or equivalent
- Full-text search (FTS5 or equivalent)
- Hash-chained audit trail
- Crash recovery and reconciliation
- Policy engine with compliance presets
- SHOULD partition Observation grain storage by observer domain, inferred from
observer_type. Physical observer types (see Section 24) SHOULD flow to time-series storage with raw-data retention policies. Cognitive observer types SHOULD flow to vector + relational storage with the same retrieval semantics as Belief grains. Implementations MUST NOT hard-code the domain partition list — treatobserver_typeas an open string and drive routing from configuration or namespace.
18. Device Profiles
18.1 Extended Profile (Default)
Target: Servers, desktops, edge gateways
- Max blob size: 1 MB
- Hash function: SHA-256 (REQUIRED)
- All fields supported
- Encryption: AES-256-GCM
- Full feature set
18.2 Standard Profile
Target: Single-board computers, mobile, IoT
- Max blob size: 32 KB
- Hash function: SHA-256
- All fields supported
- Encryption: AES-256-GCM
- Vector search: optional
18.3 Lightweight Profile
Target: Microcontrollers, battery-powered sensors
- Max blob size: 512 bytes
- Hash function: SHA-256 (hardware accelerator recommended)
- Required fields only:
type,subject,relation,object,confidence,created_at,namespace - Omit:
context,derived_from,provenance_chain,content_refs,embedding_refs - Encryption: Transport-level only (DTLS/TLS)
- Streaming deserialization recommended (no full-blob-in-memory)
19. Error Handling
19.1 Format Errors
| Condition | Error Code | Message |
|---|---|---|
| Blob shorter than 10 bytes | ERR_TOO_SHORT |
Blob must be at least 10 bytes (9-byte header + payload) |
| Unsupported version byte | ERR_VERSION |
Unsupported format version: {version} |
| Malformed MessagePack/CBOR | ERR_CORRUPT |
Invalid payload encoding |
| Payload is not a map | ERR_NOT_MAP |
Payload must be a MessagePack/CBOR map |
Missing type field |
ERR_NO_TYPE |
Missing required field: type |
| Unknown type value | ERR_UNKNOWN_TYPE |
Unknown memory type: {type} |
| Missing required field | ERR_SCHEMA |
Missing required field: {field} |
19.2 Integrity Errors
| Condition | Error Code |
|---|---|
| SHA-256 hash mismatch | ERR_INTEGRITY |
| Content address not lowercase hex | ERR_HASH_FORMAT |
| Content address wrong length | ERR_HASH_LENGTH |
19.3 Validation Errors
| Condition | Error Code |
|---|---|
| Confidence out of [0.0, 1.0] | ERR_RANGE |
| Importance out of [0.0, 1.0] | ERR_RANGE |
| Empty required string | ERR_EMPTY |
| Negative count field | ERR_RANGE |
| Float64 value is NaN or Infinity | ERR_FLOAT_INVALID |
signed flag ≠ presence of COSE wrapper |
ERR_SIGNED_MISMATCH |
| Header sensitivity bits lower than tag classification | ERR_SENSITIVITY_MISMATCH |
| Duplicate map keys | ERR_CORRUPT |
String contains BOM (EF BB BF) |
ERR_CORRUPT |
Supersession or contradiction violates invalidation_policy |
ERR_INVALIDATION_DENIED |
invalidation_policy.mode is unknown (fail-closed) |
ERR_INVALIDATION_DENIED |
Protected goal satisfied transition missing required evidence |
ERR_EVIDENCE_REQUIRED |
19.4 Forward Compatibility
Implementations MUST handle forward-compatible changes gracefully:
- Unknown fields → Deserializers preserve during round-trip; no error
- Unknown types → Deserialize as opaque map (no schema validation)
- Future version bytes → Reject with
ERR_VERSION; include version in error message
20. Security Considerations
20.1 Integrity and Authenticity
Content addressing (SHA-256 hash) proves integrity but NOT authenticity. Any party can produce a valid grain.
For authenticity, use COSE Sign1 envelope with DID-based identity verification.
20.2 Confidentiality
The .mg format itself does NOT define encryption. When encryption is required, encrypt the entire blob as an opaque byte sequence using authenticated encryption (e.g., AES-256-GCM).
Content address of encrypted grain is the hash of ciphertext, not plaintext.
Note on deduplication: Encrypting a grain changes its content address. Encrypting the same plaintext with different keys or IVs produces different ciphertext and therefore different content addresses. Encrypted grains do not deduplicate via content address. Systems requiring deduplication of encrypted data SHOULD compute and store the plaintext content address separately as metadata before encryption.
20.3 Per-User Encryption Pattern
For compliance systems handling personal data:
- Derive per-user key via HKDF-SHA256 from master key + user_id
- Encrypt grain bytes with AES-256-GCM (user's key)
- Generate HMAC token (blind index) for encrypted user_id field
- Store:
{content_address: encrypted_blob, user_id_token: hmac(...)} - Query: Look up blind index first, then decrypt matching grains
Destroying user's key → O(1) GDPR erasure (crypto-erasure).
20.4 Timing Attacks
When comparing content addresses for integrity verification, use constant-time comparison:
- Python:
hmac.compare_digest() - Go:
crypto/subtle.ConstantTimeCompare() - JavaScript:
crypto.timingSafeEqual()
20.5 Content Reference Security
URIs in content_refs and embedding_refs MAY point to external resources. When fetching:
- Validate URI (reject private IP ranges unless explicitly allowed)
- Verify
checksumfield after fetching (detect tampering) - Never auto-fetch during deserialization (fetch-on-demand only)
20.6 Compliance Scenarios
GDPR Erasure (Art. 17):
Encrypt grains with per-user keys. Destroying user's key renders all their ciphertext unrecoverable. user_id field enables scoping.
HIPAA PHI Detection:
Tag PHI-containing grains with structural_tags prefix "phi:". Policy engines inspect tags at write time.
SOX Audit Trails (Sarbanes-Oxley, Section 802):
.mg blobs are tamper-evident (content-addressed, immutable). provenance_chain traces derivation. Combined with hash-chained audit log, provides complete audit trail.
21. Test Vectors
Implementation note: Content addresses are SHA-256 of the complete blob: 9-byte fixed header (
0x01version, flags, type, 2-byte ns_hash, created_at_sec) followed by the canonical MessagePack/CBOR payload. Run the reference implementation against each input to produce verified hashes. The blob hex for Vector 1 is provided as a byte-level reference; all content addresses marked[computed by reference implementation]must be derived programmatically.
21.1 Vector 1: Minimal Fact
Input:
{
"type": "fact",
"subject": "user",
"relation": "prefers",
"object": "dark mode",
"confidence": 0.9,
"source_type": "user_explicit",
"created_at": 1768471200000,
"namespace": "shared",
"author_did": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
}Expected content address:
3288d0d41cf49a1d428e404f0b6a6fe60388be9536937557f6139b813d53a520
Blob hex (159 bytes):
01 00 01 a4 d2 69 68 ba a0 89 a4 61 64 69 64 d9 38 64 69 64 3a 6b 65 79 3a
7a 36 4d 6b 68 61 58 67 42 5a 44 76 6f 74 44 6b 4c 35 32 35 37 66 61 69 7a
74 69 47 69 43 32 51 74 4b 4c 47 70 62 6e 6e 45 47 74 61 32 64 6f 4b a1 63
cb 3f ec cc cc cc cc cc cd a2 63 61 cf 00 00 01 9b c1 19 01 00 a2 6e 73 a6
73 68 61 72 65 64 a1 6f a9 64 61 72 6b 20 6d 6f 64 65 a1 72 a7 70 72 65 66
65 72 73 a1 73 a4 75 73 65 72 a2 73 74 ad 75 73 65 72 5f 65 78 70 6c 69 63
69 74 a1 74 a4 66 61 63 74
Header breakdown:
01=version,00=flags (public, MessagePack, unsigned),01=Belief type,a4 d2=SHA-256("shared")[0:2] as uint16 big-endian,69 68 ba a0=created_at_sec (1768471200 = 2026-01-15T10:00:00Z, big-endian).Payload breakdown:
89=fixmap(9),a4 61 64 69 64=key "adid" (fixstr 4),d9 38=str8 length 56, followed by 56 UTF-8 bytes of the DID; keycvalue:cb 3f ec cc cc cc cc cc cd(float64 marker + 8 bytes =3feccccccccccccd= 0.9); then remaining keys "ca"/"ns"/"o"/"r"/"s"/"st"/"t" in lexicographic order with their values.
21.2 Vector 2: Event
Input:
{
"type": "event",
"content": "User asked about dark mode settings",
"created_at": 1768471200000,
"namespace": "shared",
"author_did": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
"importance": 0.5
}Expected content address:
[computed by reference implementation]
21.3 Vector 3: Bi-Temporal Belief
Input:
{
"type": "belief",
"subject": "Alice",
"relation": "works_at",
"object": "Acme Corp",
"confidence": 0.95,
"source_type": "user_explicit",
"created_at": 1737000000000,
"valid_from": 1735689600000,
"valid_to": 1767225600000,
"system_valid_from": 1737000000000,
"author_did": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
}Expected content address (bi-temporal fields):
[computed by reference implementation]
21.4 Vector 4: Belief with Cross-Links
Input:
{
"type": "belief",
"subject": "Bob",
"relation": "manages",
"object": "Project Alpha",
"confidence": 0.90,
"source_type": "llm_generated",
"created_at": 1737000000000,
"related_to": [
{
"hash": "4c4149355d3f3e1114e6a72bc5c2813a3ecd4deab2ba8771eaca8556b2c032f2",
"relation_type": "similar",
"weight": 0.85
},
{
"hash": "6f7fb8935e150f61a607ece0582c87c42b9975d356def0e41164b85852836145",
"relation_type": "elaborates",
"weight": 0.70
}
],
"author_did": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
}21.5 Vector 5: Observation
Input:
{
"type": "observation",
"observer_id": "temp-sensor-01",
"observer_type": "temperature",
"subject": "server-room",
"object": "22.5C",
"confidence": 0.99,
"created_at": 1737000000000,
"namespace": "monitoring",
"importance": 0.3,
"author_did": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
}21.6 Vector 6: Protected Fact with invalidation_policy
Input:
{
"type": "fact",
"subject": "agent-007",
"relation": "constraint",
"object": "never delete user files without confirmation",
"confidence": 1.0,
"source_type": "user_explicit",
"created_at": 1768471200000,
"namespace": "safety",
"invalidation_policy": {
"mode": "locked",
"authorized": ["did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"]
}
}Compaction and canonical form notes:
- Compacted key order:
c,ca,ip,ns,o,r,s,st,t— verifies thatip(invalidation_policy) sorts correctly betweencaandns. - The nested
invalidation_policymap is also sorted:authorizedbeforemode. - Namespace
"safety"→ SHA-256 first two bytes:0x85 0x6E. - Header:
0x01 0x00 0x01 0x85 0x6E+ timestamp1768471200as big-endian 4 bytes.
Expected content address:
df928038769506fb66671aced0eb97d45871e169e505ed55a382c744e620550e
22. Implementation Notes
22.1 MessagePack Libraries
| Language | Library | Sorted Keys | Notes |
|---|---|---|---|
| Python | ormsgpack |
OPT_SORT_KEYS |
Rust-backed (fast) |
| Python | msgpack |
sort_keys=True |
Pure Python fallback |
| Rust | rmp-serde |
Via BTreeMap |
Natural ordering |
| Go | msgpack/v5 |
Manual sorting | User responsible |
| JavaScript | @msgpack/msgpack |
Pre-sort keys | Manual sorting required |
| Java | jackson-dataformat-msgpack |
SORT_PROPERTIES_ALPHABETICALLY |
Feature flag |
| C# | MessagePack-CSharp |
Via SortedDictionary |
Built-in support |
22.2 String Normalization
Use Unicode NFC (Canonical Composition):
- Python:
unicodedata.normalize("NFC", s) - Go:
golang.org/x/text/unicode/norm - JavaScript:
String.prototype.normalize("NFC") - Java:
java.text.Normalizer
22.3 Constant-Time Hash Comparison
import hmac
hmac.compare_digest(expected_hash, computed_hash)import "crypto/subtle"
subtle.ConstantTimeCompare(a, b) == 1import crypto from "crypto";
crypto.timingSafeEqual(a, b);22.4 DID Parsing (did:key)
Format: did:key:z<multibase-base58-btc-encoded-multicodec-key>
Example: did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK
Parsing:
1. Remove "did:key:" prefix
2. Decode multibase (z = base58-btc) → raw bytes
3. Read multicodec prefix: one or more unsigned varint bytes identify the key type
- Ed25519 public key: prefix 0xed 0x01 (2-byte varint), followed by 32 key bytes
- Other key types use different varint values; always decode the full varint, not a fixed byte count
4. Extract public key bytes (everything after the varint prefix)
5. Verify signature using extracted public key
22.5 COSE Sign1 Libraries
- Python:
pycose(RFC 9052 compliant) - Go:
github.com/veraison/go-cose - JavaScript:
cose-js,cbor-x - Rust:
cosey
22.6 Round-Trip Testing
To verify conformance:
- Serialize grain → blob
- Hash blob → content address
- Compare against expected (test vector)
- Deserialize blob → recreate grain
- Serialize again → MUST match original blob bytes (round-trip fidelity)
22.7 Streaming and Partial Results
OMS grains are atomic, immutable knowledge units. Streaming outputs (e.g., token-by-token LLM responses, incremental tool results, partial server-sent events) are transport-layer concerns outside OMS scope. Implementations SHOULD buffer streaming content in their transport layer and emit a single immutable Event or Action grain upon stream completion. For long-running tool executions requiring progress visibility, implementations MAY emit periodic State grains (type 0x03) as progress checkpoints, linked via derived_from to the originating Action grain. Each checkpoint is a complete, self-contained grain — not a diff.
22.8 Recall Priority and Agent Memory Tiers
The recall_priority field (§6.1) maps to the memory tiering models used by agent frameworks:
recall_priority |
Tier | Framework mapping | Retrieval pattern |
|---|---|---|---|
"hot" |
In-context memory | Letta core_memory, LangChain ConversationBufferMemory |
Included in every LLM prompt. Grains SHOULD be cached in-memory by the store. |
"warm" |
Retrieval memory | Letta recall_memory, LangChain VectorStoreRetrieverMemory |
Retrieved by recency, embedding similarity, or structured filter. Typical RAG context. |
"cold" |
Archival memory | Letta archival_memory, long-term compliance storage |
Retained for completeness, audit, and compliance. Not actively retrieved unless explicitly queried. |
Stores MAY use recall_priority to select storage tiers (e.g., SSD for hot, HDD for cold, object storage for archive). Writers SHOULD set recall_priority based on expected retrieval frequency. The default when absent is "warm".
22.9 State Grain Context Schema Convention
For cross-framework agent state portability, implementations SHOULD use the following keys in the State grain (type 0x03) context map:
| Key | Type | Description |
|---|---|---|
messages_tail |
string | Content address of the most recent Event grain in the conversation |
memory_blocks |
map | Named memory blocks: {block_name: block_value_string}. Letta-compatible. |
system_prompt |
string | System prompt text, or content address of a Belief grain containing it |
active_tools |
array[string] | Tool names available in this agent state |
model |
string | LLM model identifier (e.g., "claude-opus-4-6") |
pending_tool_calls |
array[string] | Content addresses of Action grains in "call" phase awaiting results |
agent_config |
map | Framework-specific agent configuration (opaque to the spec) |
This schema is RECOMMENDED, not required. Implementations MAY include additional keys. The memory_blocks key is aligned with Letta's core_memory structure. The messages_tail key enables reconstructing the conversation by following parent_message_id chains backward from the tail.
22.10 Access Counter Semantics
Stores that implement access_count and last_accessed_at (§28.3) SHOULD observe the following:
- Stores MAY defer counter updates and flush them asynchronously. The maximum acceptable staleness is implementation-defined but SHOULD be documented.
- Only user-facing retrieval operations (search, get, query) SHOULD increment
access_count. Internal reads — provenance traversal, invalidation checks, supersession chain resolution, compliance scans, and replication — SHOULD NOT increment it. - Stores MAY use probabilistic counting (e.g., HyperLogLog) or sampling for high-frequency grains to limit write amplification.
- Stores MAY disable access tracking entirely and document this as a conformance note.
access_countandlast_accessed_atare OPTIONAL index-layer features, not conformance requirements.
References
Normative References
- RFC 2119 — Requirement Levels (MUST, SHOULD, etc.)
- RFC 8174 — Ambiguity of Uppercase vs Lowercase in RFC 2119
- RFC 8949 — CBOR (Concise Binary Object Representation)
- RFC 9052 — COSE (CBOR Object Signing and Encryption) Structures
- RFC 9901 — SD-JWT (Selective Disclosure for JSON Web Tokens)
- FIPS 180-4 — SHA-256
- UAX #15 — Unicode Normalization Forms
- W3C DID Core 1.0 — Decentralized Identifiers
- MessagePack Specification
Informative References
- W3C PROV-Overview — Provenance Data Model
- Deterministic CBOR — RFC 8949 §4.2.1 — Deterministic CBOR Encoding (Preferred Serialization)
- Gordian Envelope Internet-Draft — Content-Addressed Documents
- did:key Method Specification
- GDPR Article 17 — Right to Erasure
- HIPAA Technical Safeguards — Protected Health Information
- CCPA — California Consumer Privacy Act
23. Grain Protection and Invalidation Policy
23.1 Purpose
A grain may carry an invalidation_policy field declaring who is authorized to remove it from "current and trusted" status. This field covers all invalidation paths, not only direct supersession:
- Direct supersession — a new grain G2 is written with
derived_from: [G1]and the index setsG1.superseded_by = hash(G2) - Contradiction — the index sets
G1.contradicted = true - Semantic replacement via
related_to— advisory only; does NOT constitute formal invalidation (see §23.7)
The invalidation_policy governs paths 1 and 2. Protection is declared at grain creation time — it is part of the immutable blob and covered by the COSE signature when present.
23.2 Field Schema
invalidation_policy: {
"mode": "open" | "soft_locked" | "locked" | "quorum" | "delegated" | "timed" | "hold" | "consent_cascade",
"authorized": ["did:key:z6Mk...", ...], // for modes: delegated, quorum
"threshold": 2, // for mode: quorum — minimum co-signers
"locked_until": 1800000000, // for mode: timed — Unix epoch u64 seconds
"fallback_mode": "open", // for mode: timed — policy after unlock time
"scope": "grain" | "subtree" | "lineage", // default: "grain"
"protection_reason": "string" // optional human-readable rationale
}Mode semantics:
| Mode | Semantics | Store behavior |
|---|---|---|
open |
No restriction (default when field is absent) | Accept any supersession |
soft_locked |
Supersession permitted but MUST carry supersession_justification field |
Accept with justification; flag for human review |
locked |
No supersession or contradiction permitted | MUST reject; return ERR_INVALIDATION_DENIED |
quorum |
Superseding grain MUST carry supersession_auth array with ≥ threshold valid COSE signatures from authorized DIDs |
Verify each signature; reject if threshold not met |
delegated |
Only DIDs listed in authorized may invalidate; superseding grain MUST be COSE-signed by one of those DIDs |
Verify signer is in authorized list |
timed |
Behaves as locked until locked_until epoch; then reverts to fallback_mode |
Check wall clock against locked_until; apply fallback_mode after |
hold |
Litigation hold — grain MUST NOT be deleted, erased, or forgotten until hold is explicitly lifted. Supersedes TTL, consent withdrawal, erasure requests, and forgetting engine decay. | Reject all invalidation and erasure operations; return ERR_INVALIDATION_DENIED |
consent_cascade |
Grain is automatically eligible for erasure when its processing_basis Consent grain (§8.10, §6.1) is revoked. Stores MUST complete erasure within their stated SLA; SLA MUST be ≤ one month per GDPR Art. 12(3). |
On Consent withdrawal, identify all grains with matching processing_basis, schedule for erasure within SLA |
23.3 Fail-Closed Rule
Parsers and stores MUST treat unknown mode values as "locked". An implementation that encounters an unrecognized mode MUST reject the invalidation attempt rather than treating the policy as absent or permissive. This prevents an agent from crafting a novel mode value that an old implementation silently accepts.
If the invalidation_policy field is absent entirely, mode: "open" is implied.
23.4 Goal State Transition Protection
Protected Goal grains (those with invalidation_policy.mode ≠ "open") MAY specify which state transitions the agent may execute autonomously via the allowed_transitions field:
{
"type": "goal",
"goal_state": "active",
"invalidation_policy": {
"mode": "locked",
"authorized": ["did:key:z6MkUser..."]
},
"allowed_transitions": ["satisfied", "failed"]
}State transitions NOT listed in allowed_transitions are subject to the full invalidation_policy. If allowed_transitions is absent on a protected goal, all state transitions are subject to the policy.
Reasoning: Some goal lifecycle transitions (marking a goal satisfied because it was achieved, or failed because it became impossible) are natural completion events, not adversarial modifications. allowed_transitions lets the user designate these autonomous-safe transitions without making the entire goal unprotected.
Evidence requirement for autonomous satisfied transitions: For protected goals, an autonomous satisfied transition SHOULD include satisfaction_evidence grain references. Stores MAY enforce this when evidence_required > 0 is set. This mitigates goal laundering.
23.5 Goal Laundering (Normative)
Goal laundering is the attack pattern where an agent:
- Falsely marks a protected goal as
satisfied(claiming success criteria were met) - Creates a new goal without the protected goal's constraints
- Operates under the weaker new goal
Implementations MUST treat this as a protocol violation. Specifically:
- A grain that supersedes a protected goal inherits the original goal's
invalidation_policyunless the supersession was explicitly authorized under that policy's terms satisfiedandfailedtransitions on protected goals that have these inallowed_transitionsSHOULD require non-emptysatisfaction_evidence; stores MAY enforce this asERR_EVIDENCE_REQUIRED
23.6 Scope
The scope field controls whether protection extends to derived grains:
| Scope | Meaning |
|---|---|
grain |
Only this grain (default) |
subtree |
This grain and all grains with derived_from pointing here (transitively, up to 16 hops) |
lineage |
This grain and all grains in the same supersession chain |
For subtree scope, a store MUST check the derivation ancestry of any proposed superseding grain and reject if any ancestor within 16 hops is protected against the requester. Implementations SHOULD cache a protected_root indicator per grain to avoid O(n) traversal per write.
23.7 Bypass Paths That Conformant Implementations MUST Close
Bypass 1 — Contradiction flag: Any mutation setting contradicted=true on a grain is subject to invalidation_policy, identical to supersession. The policy check MUST apply to contradiction index mutations, not only to supersession index mutations.
Bypass 2 — related_to: "replaces" semantic claim: Writing a new grain with relation_type: "replaces" pointing to a protected grain is permitted at the blob level (it is a new, valid content-addressed object). However, a conformant store MUST NOT update the target grain's index entry (superseded_by, contradicted, system_valid_to) in response to seeing a replaces relation. The target grain remains current and its invalidation_policy is not affected. See §15.3 normative note.
Bypass 3 — Supersession chain injection: An agent cannot bypass protection on grain A by superseding a derived grain A' (which itself supersedes A), arguing it is not directly superseding A. A store MUST traverse the derived_from chain of any proposed superseding grain up to 16 hops and reject if any ancestor in the chain is protected against the requester.
23.8 Key Separation Requirement (Normative, Deployment-Dependent)
Grain-level invalidation_policy enforcement is only meaningful when the agent's DID is cryptographically distinct from the user's DID. If an agent operates under the user's signing key, any DID-based policy check trivially passes regardless of the declared policy.
Deployments using invalidation_policy with mode ≠ "open" SHOULD enforce key separation: the user holds a root DID keypair; agents receive delegated DIDs with scoped authority via W3C Verifiable Credentials or UCAN capability tokens. The .mg format does not define the delegation mechanism, but conformant stores SHOULD refuse to accept a supersession proof where the agent DID is identical to the grain's author_did for grains with mode: "locked" or mode: "quorum".
23.9 Interaction with Existing Fields
| Field | Interaction |
|---|---|
superseded_by |
Index layer populates after a conformant supersede operation passes policy check |
contradicted |
Setting this is subject to invalidation_policy; not a bypass path |
expiry_policy (Goal) |
Orthogonal — governs when a goal is inactive; invalidation_policy governs who writes its replacement. An expired goal's invalidation_policy still applies to supersession for audit chain integrity. |
evidence_required (Goal) |
Linked — for protected goals with "satisfied" in allowed_transitions, evidence_required > 0 is RECOMMENDED |
source_type |
Orthogonal — records provenance; do not conflate with protection. A "user_explicit" grain is not automatically protected; invalidation_policy must be set explicitly. |
structural_tags |
"mg:protected" MAY be added as a human-facing annotation alongside invalidation_policy but MUST NOT be used as the sole enforcement mechanism |
24. Observer Type Registry
The observer_type field on Observation grains is an open enum. Applications may define custom values beyond those listed here. Standard values are organized into two domains. Index layers MAY use this field to route physical Observation grains to time-series stores and cognitive Observation grains to vector + relational stores, but MUST NOT hard-code the domain partition list — treat observer_type as an open string governed by configuration or namespace.
24.1 Physical Observer Domain
Physical observers produce measurements of the material world: geometry, position, temperature, electromagnetic fields, acoustic signals. source_type SHOULD be "sensor" for grains produced by physical observers.
| Value | Description |
|---|---|
"lidar" |
3D laser ranging — time-of-flight or FMCW; produces point clouds |
"camera" |
RGB, depth, stereo, or thermal imaging |
"imu" |
Inertial Measurement Unit — fused gyroscope + accelerometer readings |
"gps" |
Global Positioning System or any GNSS receiver |
"temperature" |
Thermal sensor — thermocouple, thermistor, RTD, infrared |
"pressure" |
Barometric, fluid, or contact pressure sensor |
"accelerometer" |
Linear acceleration sensor (standalone, not fused with gyroscope) |
"magnetometer" |
Magnetic field sensor or digital compass |
"ultrasonic" |
Ultrasonic distance ranging — time-of-flight |
"radar" |
Radio detection and ranging |
"microphone" |
Audio input or acoustic sensor |
24.2 Cognitive Observer Domain
Cognitive observers produce observations of the information space: conversations, documents, behaviors, patterns, classifications. source_type SHOULD be "agent_inferred" for AI-generated cognitive observations and "user_explicit" for human observations.
| Value | Description |
|---|---|
"llm" |
Large Language Model as observer — produces natural language observations from input data. observer_model RECOMMENDED. |
"reflector" |
Aggregating or pattern-distilling agent — produces higher-order observations from prior Observation grains. Maps to consolidation_level ≥ 2. observer_model RECOMMENDED. |
"classifier" |
ML classification model — produces categorical observations (label + confidence score). observer_model RECOMMENDED. |
"detector" |
ML detection or anomaly detection model — produces presence/absence or anomaly observations. observer_model RECOMMENDED. |
"human" |
Human observer or annotator — records direct perception or expert judgment. observer_model MUST be absent. |
"hybrid" |
Combined physical sensor + AI processing pipeline — e.g., camera + vision model producing a semantic label from raw imagery. SHOULD include provenance_chain entries for both sensor reading and inference steps. |
24.3 Extensibility
Custom observer_type values MUST NOT be identical to any registered value in §24.1 or §24.2. Custom values SHOULD use a namespace prefix, e.g., "acme:thermal-v2" or "myapp:custom-observer". Conformant parsers MUST NOT reject unknown observer_type values.
25. Observation Mode Registry
The observation_mode field is a closed enum. It describes how the observation was produced, which determines how confidence, valid_from/valid_to, and derived_from should be interpreted by downstream consumers.
| Value | Meaning | valid_from/valid_to semantics |
Typical observer_type |
|---|---|---|---|
"passive" |
Observer perceived without intervening — watched, listened, read data as it arrived without emitting a signal or query | Covers the duration of passive reception | "camera", "microphone", "llm", "human" |
"active" |
Observer actively sampled or probed — emitted a signal, sent a query, asked a question to elicit a response | Marks the precise moment of the probe and its response window | "lidar", "radar", "ultrasonic", "llm" |
"reflective" |
Observer processed past data to synthesize — looked back at prior grains, compressed, or reflected. derived_from SHOULD be populated with the content addresses of consumed grains. |
Spans the window of the consumed input data, not the moment the grain was written. created_at is the write time; valid_from/valid_to is the observed window. |
"reflector", "llm" |
"real_time" |
Observer processed data as it arrived — stream processing with no meaningful buffering. created_at ≈ event time. |
Point-in-time; valid_from ≈ created_at |
"imu", "gps", "microphone", "llm" (streaming inference) |
Absent: When observation_mode is absent, no mode assertion is made. Consumers SHOULD treat the observation as mode-unclassified and apply conservative trust calibration.
Interaction with active mode: Grains produced by an active observer SHOULD record the probe or query that triggered the observation in context["probe"]. This enables verification that the observed response corresponds to the stated query.
26. Observation Scope Registry
The observation_scope field is a closed enum. It describes the temporal breadth of what was observed — how much time the observation covers — enabling correct interpretation of valid_from/valid_to and appropriate retrieval strategies.
| Value | Temporal Breadth | Physical Example | Cognitive Example |
|---|---|---|---|
"point" |
Single moment — one reading, one event, one inference | GPS fix at t=T; one temperature sample | Single-message LLM impression; one annotated event |
"interval" |
Defined time window — seconds to tens of minutes | 1-second IMU batch; 10-minute sensor log segment | LLM observer notes compressing the last 30 minutes of conversation |
"session" |
Entire interaction session — minutes to hours | Full robot mission from start to dock | LLM observer notes covering a complete conversation thread |
"longitudinal" |
Across multiple sessions — days, weeks, or longer | Multi-day environmental monitoring log | Reflector cross-session pattern spanning weeks of user interactions |
Default behavior:
- For physical observers,
"point"is implied whenobservation_scopeis absent. - For cognitive observers with
observation_mode: "reflective","interval"or"session"SHOULD be set explicitly. Absent scope on a reflective cognitive observation is a conformance warning at Level 2.
Interaction with temporal fields:
"point"→valid_from≈valid_to≈created_at; often omitted entirely"interval"→valid_from<valid_to; window is typically much shorter than a session"session"→valid_from= session start,valid_to= session end"longitudinal"→valid_from= earliest covered session,valid_to= latest covered session;derived_fromSHOULD enumerate the intermediate Observation grains from each covered session
27. Grain Type Field Specifications
This section provides detailed field specifications for each standard grain type. For Action grain phase fields, see §27.1. For Observer types, see §24. For Observation modes/scopes, see §25/§26.
27.1 Action Grain (type = 0x05) — Phase and Mode Details
The action_phase field acts as a discriminator for async vs. synchronous tool call recording.
action_phase discriminator:
| Value | Meaning | Required fields | Absent fields |
|---|---|---|---|
"definition" |
Definition — tool schema record | tool_name, tool_description, input_schema |
input, content, is_error, tool_call_id |
| absent (default) | Complete — synchronous call | tool_name, input, content, is_error |
derived_from |
"call" |
Call — async; result not yet received | tool_name, input |
content, is_error |
"result" |
Result — async result arrived | tool_call_id, content, is_error, derived_from |
tool_name, input |
Phase-dependent field presence:
| Field | "definition" |
"call" |
"result" |
complete (absent) |
|---|---|---|---|---|
tool_name |
REQUIRED | REQUIRED | omit | REQUIRED |
tool_description |
REQUIRED | omit | omit | omit |
input_schema |
REQUIRED | omit | omit | omit |
output_schema |
optional | omit | omit | omit |
strict |
optional | omit | omit | omit |
tool_type |
optional | optional | omit | optional |
tool_version |
optional | optional | omit | optional |
input |
MUST NOT | REQUIRED | omit | REQUIRED |
tool_call_id |
omit | RECOMMENDED | REQUIRED | optional |
call_batch_id |
omit | optional | optional | optional |
content |
MUST NOT | MUST NOT | REQUIRED | REQUIRED |
is_error |
MUST NOT | MUST NOT | REQUIRED | REQUIRED |
stdout / stderr |
MUST NOT | MUST NOT | optional | optional |
exit_code |
MUST NOT | MUST NOT | optional | optional |
duration_ms |
MUST NOT | MUST NOT | optional | optional |
derived_from |
omit | omit | [call grain hash] |
omit |
execution_mode values:
| Value | Meaning |
|---|---|
| absent (default) | Standard function call — tool_name + input |
"function_call" |
Explicit standard function call |
"code_exec" |
CodeAct-style: code field holds executable Python/shell; result in stdout/stderr |
"computer_use" |
Anthropic computer-use tool; input holds action type and coordinates |
Example 0 — Tool definition grain:
{
"type": "action",
"action_phase": "definition",
"tool_name": "get_weather",
"tool_description": "Get the current weather in a given location.",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
},
"output_schema": {
"type": "object",
"properties": {
"temperature": {"type": "number"},
"unit": {"type": "string"},
"description": {"type": "string"},
"humidity": {"type": "number"}
}
},
"strict": true,
"tool_type": "client",
"author_did": "did:web:example.com:agents:assistant",
"created_at": 1737000000000
}Example 1 — Synchronous function call:
{
"type": "action",
"tool_name": "get_weather",
"tool_call_id": "toulu_01A09q90qw90lq917835lq9",
"input": {"location": "San Francisco, CA", "unit": "celsius"},
"content": "15°C, partly cloudy",
"is_error": false,
"duration_ms": 312,
"created_at": 1737000000000
}Example 2 — CodeAct code execution:
{
"type": "action",
"execution_mode": "code_exec",
"code": "import pandas as pd\ndf = pd.read_csv('data.csv')\nprint(df.describe())",
"interpreter_id": "session-abc123",
"stdout": " age salary\ncount 100.0 100.0",
"exit_code": 0,
"is_error": false,
"created_at": 1737000000000
}Alignment with Anthropic API:
| Anthropic API field | OMS Action field |
|---|---|
tool.name |
tool_name (definition grain) |
tool.description |
tool_description |
tool.input_schema |
input_schema |
tool.strict |
strict |
| (no Anthropic equivalent) | output_schema |
tool_use.id |
tool_call_id |
tool_use.input |
input |
tool_result.content |
content |
tool_result.is_error |
is_error |
27.2 Goal Grain (type = 0x07) — Lifecycle and Provenance Details
Provenance chain methods:
| Method | Meaning |
|---|---|
"user_input" |
Human set this goal directly |
"goal_decomposition" |
Agent decomposed a parent goal |
"goal_state_transition" |
Updates state of a prior Goal grain |
"goal_revision" |
Human modified a previously set goal |
"goal_inference" |
Agent inferred from Event or Belief patterns |
"goal_delegation" |
Delegated from another agent |
27.3 source_type Registry
The source_type field is an open enum. Standard values:
| Value | Meaning |
|---|---|
"user_explicit" |
Directly stated by human user |
"agent_inferred" |
Derived by an AI agent |
"sensor" |
Physical instrument measurement |
"consolidated" |
Distilled from multiple prior grains |
"system" |
Written by infrastructure (provisioning, etc.) |
"llm_generated" |
Generated by a language model |
"imported" |
Imported from external source |
"established_knowledge" |
Widely accepted universal truth — physical constants, scientific laws, geographic facts. Grains with this value SHOULD omit user_id, SHOULD omit valid_to, SHOULD set confidence: 1.0, and SHOULD use invalidation_policy.mode: "locked". |
"axiomatic" |
Definitionally or logically true — mathematical axioms, tautologies. Same SHOULD rules as "established_knowledge". |
27.4 HIPAA PHI Tag Normalization
The 18 normative phi: tag values matching 45 CFR §164.514(b) Safe Harbor identifiers:
phi:name, phi:geo_subdivision, phi:date, phi:age_over_89, phi:phone, phi:fax, phi:email, phi:ssn, phi:mrn, phi:health_plan_id, phi:account_number, phi:certificate_license, phi:vehicle_id, phi:device_id, phi:url, phi:ip_address, phi:biometric, phi:photo.
Stores supporting HIPAA compliance MUST recognize all 18 and apply appropriate access controls. Any phi:* tag MUST be treated as PHI-sensitive regardless of whether the specific value appears in this list.
27.5 External Citation Schema
Scientific and legal workflows cite external artifacts outside the OMS hash space. The content_refs field accepts a structured external_citation object alongside standard content references:
{
"citation_type": "doi",
"identifier": "10.1038/s41586-024-07487-w",
"retrieved_at": 1737000000000,
"content_hash": "sha256:abc123...",
"citation_role": "supports"
}| Field | Type | Required | Values |
|---|---|---|---|
citation_type |
string | REQUIRED | "doi", "arxiv", "pmid", "isbn", "rrid", "clinicaltrials", "url" |
identifier |
string | REQUIRED | Type-specific identifier |
retrieved_at |
int64 | OPTIONAL | Epoch ms of retrieval |
content_hash |
string | OPTIONAL | SHA-256 of retrieved document |
citation_role |
string | OPTIONAL | "supports", "refutes", "extends", "replicates", "uses_data", "uses_software" |
The derived_from field SHOULD accept both OMS content addresses and external citation objects.
27.6 Trigger Definitions via Observation Grains
Triggers observe external systems for changes (new events, incoming webhooks, scheduled intervals). This maps naturally to the Observation grain (type 0x06) — triggers are observers. No new grain type is required; existing Observation fields accommodate trigger definitions through the following convention.
Field mapping for triggers:
| Observation Field | Trigger Usage |
|---|---|
observer_id |
Connector name (e.g., "github", "stripe") |
observer_type |
Trigger mechanism: "trigger:polling", "trigger:webhook", "trigger:schedule", "trigger:listener" |
observation_mode |
"periodic" (polling), "continuous" (webhook/listener), "scheduled" (cron) |
observation_scope |
What is being watched (e.g., "repos/{owner}/{repo}/issues") |
context |
Trigger-specific configuration using int: prefixed fields from the Integration profile (§A.7) |
Implementations MAY index Observation grains whose observer_type starts with "trigger:" to provide trigger catalog queries.
Example — Polling trigger:
{
"type": "observation",
"observer_id": "github",
"observer_type": "trigger:polling",
"observation_mode": "periodic",
"observation_scope": "repos/{owner}/{repo}/issues",
"structural_tags": ["profile:integration"],
"namespace": "axtion:connectors:github",
"context": {
"int:http_method": "GET",
"int:http_path": "/repos/{owner}/{repo}/issues",
"int:path_params": ["owner", "repo"],
"int:poll_interval_secs": 300,
"int:cursor_field": "since",
"int:cursor_type": "timestamp",
"int:connector": "github",
"int:config_schema": {
"type": "object",
"properties": {
"owner": {"type": "string"},
"repo": {"type": "string"},
"labels": {"type": "string"}
},
"required": ["owner", "repo"]
},
"int:event_schema": {
"type": "object",
"properties": {
"id": {"type": "integer"},
"title": {"type": "string"},
"state": {"type": "string"}
}
}
},
"created_at": 1740700000000
}Example — Webhook trigger:
{
"type": "observation",
"observer_id": "stripe",
"observer_type": "trigger:webhook",
"observation_mode": "continuous",
"observation_scope": "payment_intent.succeeded",
"structural_tags": ["profile:integration"],
"namespace": "axtion:connectors:stripe",
"context": {
"int:webhook_path": "/webhooks/stripe/{token}",
"int:webhook_secret_header": "Stripe-Signature",
"int:connector": "stripe",
"int:event_schema": {
"type": "object",
"properties": {
"id": {"type": "string"},
"amount": {"type": "integer"},
"currency": {"type": "string"}
}
}
},
"created_at": 1740700000000
}Example — Scheduled trigger:
{
"type": "observation",
"observer_id": "scheduler",
"observer_type": "trigger:schedule",
"observation_mode": "scheduled",
"observation_scope": "daily-report",
"structural_tags": ["profile:integration"],
"context": {
"int:cron_expression": "0 9 * * MON-FRI",
"int:timezone": "America/New_York",
"int:connector": "scheduler"
},
"created_at": 1740700000000
}27.7 Consensus Grain Usage for Action Definition Validation
When multiple independent sources produce or validate the same Action definition grain, a Consensus grain (type 0x09) records the agreement. This pattern is useful for integration platforms where definitions may be synthesized by LLMs, parsed from OpenAPI specs, validated against reference data, or refined by execution feedback analysis.
Semantics:
agreed_contentis the content address of the Action definition grain that achieved consensus.- Each entry in
participating_observersis a DID identifying a validation source. dissent_grainslink to alternative definitions that did not achieve consensus.- Consensus achievement (
agreement_count >= threshold) serves as a confidence signal for tool catalog quality.
Example — Multi-source validation consensus:
{
"type": "consensus",
"participating_observers": [
"did:web:example.com:agents:spec-parser",
"did:web:example.com:agents:llm-synthesizer",
"did:web:example.com:agents:reference-validator",
"did:web:example.com:agents:execution-evaluator"
],
"threshold": 2,
"agreement_count": 3,
"dissent_count": 1,
"agreed_content": "<content-address-of-validated-definition-grain>",
"dissent_grains": ["<content-address-of-alternative-definition>"],
"structural_tags": ["consensus:action-definition"],
"namespace": "axtion:connectors:github",
"related_to": [
{"hash": "<definition-grain-hash>", "relation_type": "supports", "weight": 1.0}
],
"created_at": 1740700000000
}28. Query Conventions
28.1 Standard Search Response Envelope
OMS does not define a transport or query protocol. However, implementations that expose search APIs SHOULD return results using the following standard envelope to ensure interoperability:
{
"results": [
{
"grain": { "...grain payload..." },
"score": 0.92,
"matched_fields": ["object", "subject"],
"content_address": "a1b2c3d4..."
}
],
"total": 142,
"next_cursor": "opaque-pagination-token"
}| Field | Type | Description |
|---|---|---|
grain |
map | Full deserialized grain payload |
score |
float64, [0.0, 1.0] | Retrieval relevance score — distinct from confidence (which is epistemic certainty). A high score means the grain matched the query well; a high confidence means the claim is believed to be true. |
matched_fields |
array[string] | Which payload fields contributed to the match |
content_address |
string | SHA-256 hex of the grain blob |
28.2 Namespace Convention
OMS uses namespace (single string) for logical partitioning and user_id for GDPR data subject scoping. Systems that require additional scoping dimensions SHOULD use structured namespace strings with : as the separator:
{org}:{app}:{agent}:{custom}
Examples:
"acme:chatbot:agent-7"— org-scoped, app-scoped, agent-scoped"acme:chatbot:agent-7:session-42"— additionally run-scoped"agent:identity"— reserved for ownership and identity grains (§12.5)"shared"— default, no specific partition
The run_id field (§6.1) provides session/run scoping orthogonal to the namespace hierarchy. Use run_id when runs are ephemeral and high-cardinality; use namespace segments when partitions are stable and low-cardinality.
28.3 Index-Layer-Managed Fields
The following fields are updated by the store/index layer after initial write, not by the grain author. These fields are not stored in the immutable .mg blob, are not part of the content address, and are not covered by COSE signatures (see §5.6). Writers MUST NOT set these fields; stores MUST update them atomically:
| Field | Updated when |
|---|---|
superseded_by |
A superseding grain is accepted |
system_valid_to |
Grain is superseded or contradicted |
verification_status |
Verification, contestation, or retraction occurs |
access_count |
Grain is retrieved by a search or get operation (see §22.10 for semantics) |
last_accessed_at |
Grain is retrieved by a search or get operation (see §22.10 for semantics) |
28.4 Store Protocol Convention
OMS does not define a formal store API. However, implementations that expose a programmatic store interface SHOULD implement the following operations to ensure interoperability:
| Operation | Signature | Description |
|---|---|---|
get |
(content_address) → grain | not_found |
Retrieve a grain by its SHA-256 content address |
put |
(blob_bytes) → content_address | error |
Store a grain blob; returns its content address. Idempotent: re-storing an existing blob is a no-op. |
supersede |
(old_address, new_blob_bytes, justification?) → new_address | error |
Atomic supersession: validates invalidation_policy on the old grain, writes the new grain, and updates the old grain's index-layer fields (superseded_by, system_valid_to). This MUST be atomic — if any step fails, the entire operation rolls back. |
exists |
(content_address) → bool |
Check if a grain exists without retrieving it |
query |
(filters, sort, limit, cursor) → result_envelope |
Structured query with the response envelope from §28.1 |
search |
(embedding_or_text, filters, limit) → result_envelope |
Semantic similarity search combined with structured filters |
delete |
(content_address) → void | error |
Compliance-only erasure (GDPR Art. 17, consent cascade). MUST NOT be exposed as a general-purpose API. MUST check litigation holds (invalidation_policy.mode: "hold") before deleting. |
put_batch |
(blob_bytes[]) → content_address[] | error[] |
Batch ingest for consolidation, migration, and high-throughput scenarios |
get_batch |
(content_address[]) → grain[] | not_found[] |
Batch retrieval for provenance chain traversal and context assembly |
Stores SHOULD implement supersede as a distinct operation rather than exposing raw put + index mutation separately. Supersession is the most error-prone operation (invalidation policy checks, derivation DAG traversal for scope: "subtree", atomic index update) and deserves a dedicated, well-tested code path.
28.5 Agent Capability Convention
Agents that participate in multi-agent systems SHOULD advertise their capabilities by writing a Belief grain with the mg:has_capability relation to the "agent:identity" namespace. This grain serves as the OMS equivalent of an A2A Agent Card or MCP server capability declaration.
Convention:
{
"type": "belief",
"subject": "did:web:example.com:agents:summarizer",
"relation": "mg:has_capability",
"object": {
"name": "Text Summarizer",
"description": "Summarizes long documents into key points",
"supported_tools": ["summarize_text", "extract_entities"],
"input_modalities": ["text"],
"output_modalities": ["text"],
"protocol": "oms",
"max_context_tokens": 200000
},
"confidence": 1.0,
"source_type": "system",
"namespace": "agent:identity",
"author_did": "did:web:example.com",
"invalidation_policy": {
"mode": "delegated",
"authorized": ["did:web:example.com"]
}
}The object map is an open schema. Standard keys:
| Key | Type | Description |
|---|---|---|
name |
string | Human-readable agent name |
description |
string | Agent purpose and capabilities summary |
supported_tools |
array[string] | Tool names this agent can invoke (cross-reference with Action definition grains) |
input_modalities |
array[string] | "text", "image", "audio", "video". What the agent can consume. |
output_modalities |
array[string] | What the agent can produce |
protocol |
string | Communication protocol: "oms", "mcp", "a2a", "custom". Open enum. |
max_context_tokens |
int | Maximum context window in tokens |
model |
string | Underlying LLM model identifier |
Agents can discover other agents by querying Belief grains with relation: "mg:has_capability" in the "agent:identity" namespace.
28.6 Conversation Threading Convention
Conversations are reconstructed from Event grain sequences using session_id and parent_message_id:
- All Event grains in a conversation MUST share the same
session_id. - Event grains SHOULD populate
parent_message_id(§6.2) to form a linked list from newest to oldest. - Branch points are expressed by two Event grains sharing the same
parent_message_idbut having different content addresses (tree-of-thought, beam search, alternative paths). - A State grain (type 0x03) with
relation: "mg:state_at"and acontextmap containing{messages_tail, message_count, participants}represents a conversation snapshot. - Conversation summaries are Belief grains with
consolidation_level >= 1,derived_frompointing to the summarized Event grains, andsource_type: "consolidated".
Retrieving a conversation:
- Query:
type=event, session_id=X, system_valid_to=null, sort=timestamp_ms ASC - Or: start from the most recent Event grain (
messages_tailin a State grain) and followparent_message_idbackward.
28.7 Session Handoff Convention
When Agent A transfers control of a conversation to Agent B, the handoff is recorded using a Goal grain with mg:delegates_to relation and delegation scope fields (§6.11):
- Agent A writes a Goal grain with
relation: "mg:delegates_to",subject= Agent A's DID,object= Agent B's DID, and delegation scope fields specifyingauthorized_namespaces,authorized_tools,context_grains, andreturn_to. - The
context_grainsfield contains content addresses of grains Agent B needs to continue — typically the recent Event grain chain and any relevant Belief/State grains. - Agent B ingests the referenced grains, validates the delegation scope, and continues with a new
run_idbut the samesession_id. - When Agent B completes its task, it writes a Goal grain with
goal_state: "satisfied"linked viaderived_fromto the delegation grain, and control returns to the agent specified inreturn_to.
28.8 CAL and SML — Companion Query and Markup Languages
The query conventions in this section (§28.1–§28.7) define OMS store operations and response envelopes at the structural level. The Context Assembly Language (CAL) (CONTEXT-ASSEMBLY-LANGUAGE-CAL-SPECIFICATION.md) is the companion specification that provides a formal, deterministic syntax for invoking these operations from an agent or LLM.
Relationship to §28.4 Store Protocol:
CAL extends the store operations defined in §28.4 with a structured query language. Where §28.4 defines query, search, get, put, and supersede as abstract operations, CAL provides the syntax for expressing them safely — with built-in token-budget awareness, multi-source composition, and a type system tied to OMS grain types.
| §28.4 store operation | CAL statement |
|---|---|
query + search |
RECALL <type> WHERE … LIMIT … |
put (new grain) |
ADD <type> SET field = value … REASON "…" |
supersede |
SUPERSEDE <hash> SET field = value … REASON "…" |
query/search + get_batch + compose |
ASSEMBLE … FROM … BUDGET <n> TOKENS |
| introspection | DESCRIBE <type> |
delete (compliance erasure) |
no CAL equivalent — structurally excluded |
SML output format:
CAL ASSEMBLE statements produce SML (Semantic Markup Language) output by default. SML is a flat, tag-based markup format optimized for LLM consumption: tag names are OMS grain types (<belief>, <goal>, <event>, …), attributes carry lightweight metadata, and text content is natural language. See the SML specification for the full format definition, structural rules, and progressive disclosure model. Implementations that expose a query layer SHOULD support CAL and produce SML output for agent context assembly.
Appendix A: Domain Profile Registry
Domain Profiles allow implementers to extend the OMS field vocabulary with domain-specific fields while preserving core interoperability. A grain declares membership in a domain profile by including a structural_tag of the form "profile:<name>" (e.g., "profile:healthcare"). A grain MAY declare membership in multiple profiles.
Rules for profile implementations:
- Profile-specific field names MUST use the domain namespace prefix defined below.
- Profile fields that are required within the profile MUST be validated only when the profile tag is present; they are always optional in the absence of the profile tag.
- Profile fields MUST NOT conflict with core OMS field names (§6).
- Profile short keys for compaction MUST be registered with the OMS working group to avoid collisions.
A.1 Healthcare Profile (hc:)
Tag: "profile:healthcare" | Namespace prefix: hc:
Applies to grains that handle Protected Health Information (PHI) under HIPAA, health records under HL7 FHIR, or clinical observations. Grains using this profile SHOULD also include structural_tags entries from the normative phi: tag set (§27.4) when applicable.
| Field | Type | Required | Description |
|---|---|---|---|
hc:patient_id |
string | when applicable | De-identified patient reference; MUST NOT be a direct identifier unless encryption is active |
hc:encounter_id |
string | no | HL7 FHIR Encounter resource ID |
hc:practitioner_did |
string | no | DID of the treating practitioner or ordering clinician |
hc:icd10 |
string[] | no | ICD-10-CM diagnosis codes |
hc:cpt |
string[] | no | CPT procedure codes |
hc:loinc |
string | no | LOINC code for laboratory or clinical observations |
hc:snomed |
string | no | SNOMED CT concept identifier |
hc:fhir_resource |
string | no | FHIR resource type (e.g., "Observation", "Condition", "MedicationRequest") |
hc:fhir_id |
string | no | FHIR resource ID on the source system |
hc:consent_ref |
string | no | Content address of the Consent grain authorizing this PHI grain |
hc:deidentification |
string | no | De-identification method applied: "safe_harbor" (45 CFR §164.514(b)) or "expert_determination" (45 CFR §164.514(a)) |
Normative: Grains with "profile:healthcare" and PHI content MUST set processing_basis: "consent" (or applicable legal basis) and MUST NOT set license to any open license value.
A.2 Legal Profile (legal:)
Tag: "profile:legal" | Namespace prefix: legal:
Applies to grains that represent contracts, case law, regulatory filings, legal opinions, or compliance records.
| Field | Type | Required | Description |
|---|---|---|---|
legal:jurisdiction |
string | recommended | ISO 3166-1 alpha-2 country code or "EU", "UN", etc. |
legal:matter_id |
string | no | Internal matter or case docket identifier |
legal:document_type |
string | no | "contract", "opinion", "filing", "statute", "regulation", "order", "brief" |
legal:parties |
string[] | no | DID or identifier of each legal party |
legal:effective_date |
integer | no | Unix epoch ms; date on which the legal instrument takes effect |
legal:expiry_date |
integer | no | Unix epoch ms; date on which the legal instrument expires or is superseded |
legal:citation |
string | no | Formal legal citation string (e.g., "42 U.S.C. § 1983") |
legal:privilege |
string | no | Privilege assertion: "attorney_client", "work_product", "none" |
legal:hold_ref |
string | no | Content address of the Invalidation grain placing this grain under litigation hold |
legal:redaction_level |
string | no | "none", "partial", "full" |
Normative: Grains with legal:privilege: "attorney_client" or "work_product" MUST have invalidation mode "hold" applied before any export or cross-system transfer. Implementations MUST NOT auto-erase held grains (even on GDPR erasure requests) without documented litigation hold lift.
A.3 Finance Profile (fin:)
Tag: "profile:finance" | Namespace prefix: fin:
Applies to grains that represent financial transactions, market observations, risk assessments, or regulatory filings (SOX, MiFID II, etc.).
| Field | Type | Required | Description |
|---|---|---|---|
fin:account_id |
string | no | Obfuscated or tokenized account reference |
fin:instrument_id |
string | no | ISIN, CUSIP, FIGI, or other instrument identifier |
fin:ticker |
string | no | Exchange ticker symbol |
fin:amount |
number | no | Transaction amount |
fin:currency |
string | no | ISO 4217 three-letter currency code |
fin:transaction_type |
string | no | "debit", "credit", "transfer", "fee", "trade", "settlement" |
fin:market_timestamp |
integer | no | Exchange-provided timestamp in Unix epoch ms |
fin:venue |
string | no | Trading venue MIC code (ISO 10383) |
fin:strategy_id |
string | no | Quantitative strategy or model identifier |
fin:risk_score |
number | no | Normalized risk score [0.0–1.0] |
fin:sox_control_id |
string | no | SOX internal control identifier for audit trail linkage |
fin:retention_years |
integer | no | Regulatory retention requirement in years (overrides default retention policy) |
Normative: Grains with "profile:finance" that contain personally identifiable financial information MUST NOT be exported without processing_basis set and without applicable consent or contractual basis documented.
A.4 Robotics Profile (rob:)
Tag: "profile:robotics" | Namespace prefix: rob:
Applies to grains produced by or about embodied robotic systems operating in physical environments.
| Field | Type | Required | Description |
|---|---|---|---|
rob:robot_id |
string | recommended | Unique robot platform identifier (URI or DID) |
rob:pose |
object | no | {x, y, z, roll, pitch, yaw} in the robot's reference frame |
rob:velocity |
object | no | {vx, vy, vz} in m/s |
rob:map_id |
string | no | Identifier of the map or environment model in use |
rob:mission_id |
string | no | Identifier of the current mission or task |
rob:battery_pct |
number | no | Battery charge at observation time [0.0–100.0] |
rob:safety_state |
string | no | "normal", "warning", "emergency_stop", "recovery" |
rob:hardware_rev |
string | no | Robot hardware revision string |
rob:firmware_ver |
string | no | Firmware version string |
rob:contact_forces |
object | no | Force/torque sensor readings at contact points |
rob:coordinate_frame |
string | no | Reference frame identifier (e.g., "world", "odom", "base_link") |
A.5 Science Profile (sci:)
Tag: "profile:science" | Namespace prefix: sci:
Applies to grains produced in scientific research workflows — experiments, datasets, findings, replication records.
| Field | Type | Required | Description |
|---|---|---|---|
sci:doi |
string | no | Digital Object Identifier for the source publication or dataset |
sci:arxiv_id |
string | no | arXiv preprint identifier (e.g., "2501.00123") |
sci:pmid |
string | no | PubMed article identifier |
sci:dataset_id |
string | no | Dataset identifier (DOI, Zenodo, Figshare, etc.) |
sci:experiment_id |
string | no | Local experiment or trial identifier |
sci:protocol_id |
string | no | Protocol identifier or URL (e.g., protocols.io DOI) |
sci:hypothesis |
string | no | Free-text hypothesis being tested |
sci:result_status |
string | no | "positive", "negative", "inconclusive", "replicated", "failed_replication" |
sci:p_value |
number | no | Statistical p-value of the result [0.0–1.0] |
sci:effect_size |
number | no | Standardized effect size (Cohen's d, r, etc.) |
sci:sample_size |
integer | no | Number of subjects or samples |
sci:preregistered |
boolean | no | Whether the study was pre-registered (e.g., on OSF, AsPredicted) |
sci:open_access |
boolean | no | Whether the source is open access |
A.6 Consumer Profile (con:)
Tag: "profile:consumer" | Namespace prefix: con:
Applies to grains produced in consumer-facing agent contexts — personal assistants, recommendation systems, preference learning, and lifestyle applications.
| Field | Type | Required | Description |
|---|---|---|---|
con:device_type |
string | no | "mobile", "desktop", "smart_speaker", "wearable", "tv", "kiosk" |
con:app_id |
string | no | Application or product identifier |
con:app_version |
string | no | Application version string |
con:locale |
string | no | BCP 47 language tag (e.g., "en-US", "fr-FR") |
con:preference_category |
string | no | Domain of the preference (e.g., "music", "food", "news", "shopping") |
con:interaction_type |
string | no | "explicit_feedback", "implicit_signal", "purchase", "skip", "save", "share" |
con:sentiment |
number | no | Sentiment score [-1.0 = very negative, 1.0 = very positive] |
con:engagement_duration_ms |
integer | no | Duration of user engagement with the referenced content in milliseconds |
con:recommendation_rank |
integer | no | Position in a recommendation list that triggered the interaction |
con:ab_variant |
string | no | A/B test variant identifier |
con:ccpa_opted_out |
boolean | no | User has exercised CCPA opt-out of sale; MUST NOT be used as a processing basis — use processing_basis field instead |
Normative: Grains with "profile:consumer" that include user_id or any direct identifier MUST set processing_basis to a lawful basis under GDPR Art. 6 / CCPA § 1798.100 before cross-system transfer. Grains with con:ccpa_opted_out: true MUST NOT be included in data sale or data broker transfers.
A.7 Integration Profile (int:)
Tag: "profile:integration" | Namespace prefix: int:
Applies to grains that represent REST API connectors, tool catalog entries, webhook definitions, or integration platform action registries. Integration profile fields are stored in the grain's context map (compact key: ctx), following the same pattern as other domain profiles.
| Field | Type | Required | Description |
|---|---|---|---|
int:base_url |
string | no | API base URL (e.g., "https://api.github.com") |
int:http_method |
string | no | HTTP method: "GET", "POST", "PUT", "PATCH", "DELETE" |
int:http_path |
string | no | URL path template with {param} placeholders (e.g., "/repos/{owner}/{repo}/issues") |
int:path_params |
string[] | no | Parameter names extracted from path template |
int:query_params |
string[] | no | Query parameter names |
int:body_params |
string[] | no | Body parameter names (for POST/PUT/PATCH) |
int:response_mapping |
string | no | JQ-compatible expression for response transformation (e.g., ".data.items") |
int:auth_type |
string | no | Auth mechanism: "api_key", "api_key:bearer", "api_key:header", "oauth2", "basic", "jwt", "none". Open enum — implementations MAY define additional values (e.g., "aws_sigv4", "mtls") |
int:auth_scopes |
string[] | no | Required OAuth scopes (e.g., ["repo", "read:org"]) |
int:read_only |
boolean | no | true if action does not mutate external state |
int:connector |
string | no | Parent connector slug (e.g., "github", "stripe") |
int:docs_url |
string | no | Documentation URL for this action or connector |
int:rate_limit |
integer | no | Advisory maximum requests per minute; enforcement is an implementation concern |
int:category |
string | no | Connector category (e.g., "dev-tools", "crm", "communication") |
int:sunset_date |
string | no | ISO 8601 date when this action will be removed |
int:content_type |
string | no | Request content type if non-default (e.g., "application/x-www-form-urlencoded") |
Trigger-specific fields (used in Observation grains with observer_type starting with "trigger:"; see §27.6):
| Field | Type | Used By | Description |
|---|---|---|---|
int:poll_interval_secs |
integer | polling | Seconds between polls |
int:cursor_field |
string | polling | Field name for incremental fetching (e.g., "since", "last_id") |
int:cursor_type |
string | polling | Cursor type: "timestamp", "id", "etag" |
int:webhook_path |
string | webhook | Inbound webhook receiver path |
int:webhook_secret_header |
string | webhook | Header containing HMAC signature |
int:cron_expression |
string | schedule | Cron expression (e.g., "0 9 * * MON-FRI") |
int:timezone |
string | schedule | IANA timezone (e.g., "America/New_York") |
int:config_schema |
map | all | JSON Schema for trigger configuration |
int:event_schema |
map | all | JSON Schema for emitted events |
Normative:
- Grains with
"profile:integration"SHOULD includeint:connectorandint:auth_type. int:http_pathparameters MUST match entries inint:path_params.int:response_mappingMUST be a valid JQ expression if present.int:rate_limitis advisory only — enforcement is an implementation concern.
Example — Action definition with integration profile:
{
"type": "action",
"action_phase": "definition",
"tool_name": "github:create-issue",
"tool_description": "Create a new issue in a GitHub repository",
"input_schema": {
"type": "object",
"properties": {
"owner": {"type": "string"},
"repo": {"type": "string"},
"title": {"type": "string"},
"body": {"type": "string"},
"labels": {"type": "array", "items": {"type": "string"}}
},
"required": ["owner", "repo", "title"]
},
"output_schema": {
"type": "object",
"properties": {
"id": {"type": "integer"},
"number": {"type": "integer"},
"html_url": {"type": "string"}
}
},
"structural_tags": ["profile:integration"],
"context": {
"int:base_url": "https://api.github.com",
"int:http_method": "POST",
"int:http_path": "/repos/{owner}/{repo}/issues",
"int:path_params": ["owner", "repo"],
"int:body_params": ["title", "body", "labels"],
"int:auth_type": "api_key:bearer",
"int:read_only": false,
"int:connector": "github",
"int:category": "dev-tools"
},
"namespace": "axtion:connectors:github",
"created_at": 1740700000000
}Appendix B: ABNF Grammar
mg-blob = version-byte header-fields msgpack-payload
version-byte = %x01
header-fields = flags-byte type-byte ns-hash-bytes created-at-bytes
; version-byte + header-fields = 9-byte "fixed header" in §3.1
flags-byte = %x00-FF
type-byte = %x01-0A / %xF0-FF
; Belief=0x01, Event=0x02, State=0x03, Workflow=0x04, Action=0x05,
; Observation=0x06, Goal=0x07, Reasoning=0x08, Consensus=0x09,
; Consent=0x0A, 0x0B-0xEF reserved, 0xF0-0xFF domain profile types
ns-hash-bytes = 2OCTET ; uint16 big-endian, first two bytes of SHA-256(namespace)
created-at-bytes = 4OCTET ; uint32 big-endian
msgpack-payload = canonical-map
canonical-map = fixmap / map16 / map32
fixmap = %x80-8F *key-value
map16 = %xDE uint16 *key-value
map32 = %xDF uint32 *key-value
key-value = msgpack-string msgpack-value
msgpack-string = fixstr / str8 / str16 / str32 ; UTF-8 NFC-normalized
msgpack-value = msgpack-string / msgpack-int / msgpack-float
/ msgpack-bool / msgpack-array / canonical-map
/ msgpack-null ; but nulls MUST be omitted from maps
content-address = 64 HEXDIG
mg-file = magic flags grain-count field-map-ver compression-type
reserved offset-table grains footer
magic = "MG" %x01
flags = %x00-FF
grain-count = 4OCTET ; uint32
field-map-ver = %x00-FF
compression-type = %x00-FF
reserved = 6OCTET
offset-table = *4OCTET ; grain_count × uint32
grains = *mg-blob
footer = 32OCTET ; SHA-256 checksumAppendix C: Field Mapping Table (Compact Reference)
Core & Multi-Modal Fields:
{
"t": "type",
"s": "subject",
"r": "relation",
"o": "object",
"c": "confidence",
"st": "source_type",
"ca": "created_at",
"tt": "temporal_type",
"vf": "valid_from",
"vt": "valid_to",
"svf": "system_valid_from",
"svt": "system_valid_to",
"ctx": "context",
"sb": "superseded_by",
"ct": "contradicted",
"im": "importance",
"adid": "author_did",
"ns": "namespace",
"user": "user_id",
"tags": "structural_tags",
"df": "derived_from",
"cl": "consolidation_level",
"sc": "success_count",
"fc": "failure_count",
"pc": "provenance_chain",
"odid": "origin_did",
"ons": "origin_namespace",
"cr": "content_refs",
"er": "embedding_refs",
"rt": "related_to",
"_e": "_elided",
"_do": "_disclosure_of",
"ip": "invalidation_policy",
"sj": "supersession_justification",
"sa": "supersession_auth",
"own": "owner",
"cat": "category",
"rid": "run_id",
"role": "role",
"ac": "access_count",
"laa": "last_accessed_at",
"tms": "timestamp_ms",
"obsdid": "observer_did",
"sdid": "subject_did",
"gdid": "grantee_did",
"sid2": "session_id",
"eid": "entity_id",
"epstat": "epistemic_status",
"vstatus": "verification_status",
"rhr": "requires_human_review",
"pbasis": "processing_basis",
"idst": "identity_state",
"lic": "license",
"tts": "trusted_timestamp",
"itype": "invalidation_type",
"ireason": "invalidation_reason",
"iinit": "invalidation_initiator",
"rpol": "retention_policy",
"rpri": "recall_priority",
"scope": "scope",
"isw": "is_withdrawal",
"basis": "basis",
"jur": "jurisdiction",
"pcon": "prior_consent",
"wdids": "witness_dids",
"prem": "premises",
"conc": "conclusion",
"imethod": "inference_method",
"altc": "alternatives_considered",
"statctx": "statistical_context",
"swenv": "software_environment",
"params": "parameter_set",
"rseed": "random_seed"
}Action-Specific Fields:
{
"aphase": "action_phase",
"tn": "tool_name",
"inp": "input",
"cnt": "content",
"iserr": "is_error",
"tcid": "tool_call_id",
"cbid": "call_batch_id",
"ttype": "tool_type",
"tver": "tool_version",
"emode": "execution_mode",
"code": "code",
"out": "stdout",
"err2": "stderr",
"xc": "exit_code",
"iid": "interpreter_id",
"err": "error",
"etype": "error_type",
"dur": "duration_ms",
"ptid": "parent_task_id",
"tdesc": "tool_description",
"isch": "input_schema",
"osch": "output_schema",
"strict": "strict"
}Consensus-Specific Fields:
{
"pobs": "participating_observers",
"thold": "threshold",
"agcnt": "agreement_count",
"discnt": "dissent_count",
"disgrn": "dissent_grains",
"agcon": "agreed_content"
}Observation-Specific Fields:
{
"oid": "observer_id",
"otype": "observer_type",
"fid": "frame_id",
"sg": "sync_group",
"omode": "observation_mode",
"oscope": "observation_scope",
"omdl": "observer_model",
"ocmp": "compression_ratio"
}Goal-Specific Fields:
{
"desc": "description",
"gs": "goal_state",
"crit": "criteria",
"crs": "criteria_structured",
"pri": "priority",
"pgs": "parent_goals",
"sr": "state_reason",
"se": "satisfaction_evidence",
"prog": "progress",
"dto": "delegate_to",
"dfo": "delegate_from",
"ep": "expiry_policy",
"rec": "recurrence",
"evreq": "evidence_required",
"rof": "rollback_on_failure",
"atr": "allowed_transitions"
}Content Reference Nested Compaction:
{
"u": "uri",
"m": "modality",
"mt": "mime_type",
"sz": "size_bytes",
"ck": "checksum",
"md": "metadata"
}Embedding Reference Nested Compaction:
{
"vi": "vector_id",
"mo": "model",
"dm": "dimensions",
"ms": "modality_source",
"di": "distance_metric"
}Related-To Nested Compaction:
{
"h": "hash",
"rl": "relation_type",
"w": "weight"
}Integration Profile Fields (stored in context map):
{
"ib": "int:base_url",
"ihm": "int:http_method",
"ihp": "int:http_path",
"ipp": "int:path_params",
"iqp": "int:query_params",
"ibp": "int:body_params",
"irm": "int:response_mapping",
"iat": "int:auth_type",
"ias": "int:auth_scopes",
"iro": "int:read_only",
"ic": "int:connector",
"idu": "int:docs_url",
"irl": "int:rate_limit",
"icat": "int:category",
"isd": "int:sunset_date",
"ict": "int:content_type",
"ipis": "int:poll_interval_secs",
"icf": "int:cursor_field",
"icft": "int:cursor_type",
"iwp": "int:webhook_path",
"iwsh": "int:webhook_secret_header",
"icron": "int:cron_expression",
"itz": "int:timezone",
"icfg": "int:config_schema",
"ievt": "int:event_schema"
}Appendix D: Compliance Mapping
GDPR
| Article | .mg Support |
|---|---|
| Art. 5 (Data minimization) | user_id field enables per-person scope |
| Art. 12-23 (Rights) | Structured data format for automated response |
| Art. 17 (Erasure) | Crypto-erasure via key destruction |
| Art. 25 (Privacy by design) | Provenance and audit built-in |
| Art. 30 (Records of processing) | provenance_chain and created_at timestamps support records-of-processing obligations |
| Art. 32 (Security) | COSE signing, AES-256-GCM encryption |
HIPAA (45 CFR §164)
| Section | .mg Support |
|---|---|
| §164.308 (Administrative) | Audit trail via provenance_chain |
| §164.310 (Physical) | N/A (transport layer) |
| §164.312 (Technical) | AES-256-GCM encryption, COSE signatures |
| §164.314 (Organizational) | N/A (policy engine) |
CCPA
| Requirement | .mg Support |
|---|---|
| Personal information collection | user_id and structural_tags for classification |
| Disclosure | Selective disclosure hides sensitive fields |
| Deletion | Crypto-erasure via key destruction |
| Opt-out | Policy-layer enforcement (outside .mg) |
Appendix E: Version History
See CHANGELOG.md for the full version history.
Appendix F: Glossary
- Blob: Complete .mg binary (9-byte fixed header + MessagePack payload)
- Grain: Atomic knowledge unit; identified by content address
- Content address: SHA-256 hash of blob bytes; unique identifier
- Canonical: Deterministic serialization rules ensuring identical bytes
- DID: W3C decentralized identifier; cryptographic identity without CA
- COSE: CBOR Object Signing and Encryption (RFC 9052)
- Selective disclosure: Hiding some fields while proving they exist
- Provenance: Derivation trail showing how grain was created
- Cross-link: Semantic relationship between grains
- Bi-temporal: Tracking both event-time and system-time dimensions
- Belief: Grain type 0x01 — a held claim, factual statement, or declarative knowledge about the world
- Event: Grain type 0x02 — a discrete occurrence with start/end time
- State: Grain type 0x03 — a persisting condition or status at a point in time
- Workflow: Grain type 0x04 — a structured process or multi-step plan
- Action: Grain type 0x05 — a completed tool invocation, API call, or agent action
- Observation: Grain type 0x06 — a raw sensor or environmental reading without interpretation
- Goal: Grain type 0x07 — a desired future state or objective
- Reasoning: Grain type 0x08 — an inference chain, chain-of-thought, or decision rationale
- Consensus: Grain type 0x09 — an agreement reached among multiple agents or sources
- Consent: Grain type 0x0A — a data subject's GDPR/CCPA/LGPD/PIPL consent or withdrawal record
- processing_basis: Legal basis for processing personal data under GDPR Art. 6 (consent, contract, legal_obligation, vital_interests, public_task, legitimate_interests)
- consent_cascade: Invalidation mode that propagates erasure/restriction to all grains linked via
processing_basiswhen a Consent grain is invalidated - verification_status: Lifecycle verification state of a grain's content:
"unverified"(default — not yet reviewed),"verified"(confirmed correct by an authority),"contested"(contradicted or disputed),"retracted"(withdrawn from use) - run_id: Session or execution scope identifier; distinct from user_id (data subject) and namespace (logical partition)
- Crypto-erasure: Destroying encryption key to unrecoverably erase data
- Blind index: HMAC token for searching encrypted data without decryption
Appendix G: Complete Example Grain
# Create a belief grain
grain = {
"type": "belief",
"subject": "machine-learning",
"relation": "is_subset_of",
"object": "artificial-intelligence",
"confidence": 0.99,
"epistemic_status": "accepted",
"source_type": "user_explicit",
"created_at": 1737000000000,
"timestamp_ms": 1737000000000,
"namespace": "knowledge-base",
"author_did": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
"user_id": "researcher-alice",
"importance": 0.95,
"structural_tags": ["ai", "ml", "education"],
"context": {"source": "textbook", "chapter": "1.2"},
"provenance_chain": [
{"source_hash": "abc123...", "method": "direct_input", "weight": 1.0}
],
"related_to": [
{
"hash": "def456...",
"relation_type": "elaborates",
"weight": 0.8
}
]
}
# Serialize to .mg blob (9-byte fixed header, version byte 0x01)
# 1. Compact field names
# 2. Omit null values
# 3. NFC-normalize strings
# 4. Sort keys lexicographically
# 5. Encode as canonical MessagePack
# 6. Prepend 9-byte fixed header: version(1) + flags(1) + type(1) + ns_hash(2) + created_at(4)
# type byte = 0x01 (Belief)
# 7. Compute SHA-256 hash
blob = serialize(grain)
content_address = sha256(blob).hex()
# Result: 64-character lowercase hex string
# Example: 3a1f5d8e9c2b7a4f6e9d2c8b1a4f7e9d2c8b1a4f7e9d2c8b1a4f7e9d2c8b1a4fDocument Status: This is a v1.3 revision of the .mg format specification. This revision adds output_schema to the Action grain definition phase, introduces the Integration domain profile (profile:integration) for REST API connectors and tool catalogs, documents trigger definition conventions via Observation grains, and documents Consensus grain usage patterns for multi-source action definition validation. Submitted as a standards track document for consideration as an IETF RFC and W3C standard. Community feedback is encouraged through issue tracking and discussion forums.
Last Updated: 2026-03-03 License: This document is offered under the Open Web Foundation Final Specification Agreement (OWFa 1.0)