# Memory Grain (memorygrain.org) - Open Memory Specification
# Full content index for AI language models and crawlers
# Version 1.3 - February 2026
# License: Specification (OWFa 1.0), Content (CC0 1.0)

================================================================================
## OVERVIEW
================================================================================

The Memory Grain (.mg) format is an open binary standard for atomic, immutable
knowledge units produced and consumed by autonomous systems. It is to agent
memory what .git objects are to version control.

Design goals:
- Content-addressed (SHA-256 hash IS the unique ID  -  no server needed)
- Immutable (any modification produces a new address)
- Portable (self-describing binary; reads anywhere, no external schema)
- Compliance-ready (GDPR crypto-erasure, HIPAA PHI routing, CCPA disclosure)
- Scale-independent (512-byte IoT grain to 1MB server grain; same format)

Primary use cases:
- AI agents: durable memory across restarts, context window overflow
- Autonomous vehicles: lidar observation recording, incident reconstruction
- Robotics: cross-fleet knowledge sharing, OTA continuity
- IoT sensors: deterministic binary at 10Hz, hardware SHA-256
- Healthcare AI: HIPAA PHI routing without payload deserialization
- Enterprise: SOX-compliant tamper-evident audit trail

================================================================================
## BINARY FORMAT SPECIFICATION (v1.3)
================================================================================

### Fixed 9-Byte Header

Every .mg blob begins with this 9-byte fixed header:

  Byte 0: Version       -  0x01 (the only valid value; any other byte -> ERR_VERSION)
  Byte 1: Flags         -  bitmask (see below)
  Byte 2: Type          -  0x01=Belief, 0x02=Event, 0x03=State,
                         0x04=Workflow, 0x05=Action, 0x06=Observation,
                         0x07=Goal, 0x08=Reasoning, 0x09=Consensus, 0x0A=Consent
                         0x0B-0xEF: reserved; 0xF0-0xFF: domain profile types
  Bytes 3-4: NS Hash    -  first two bytes of SHA-256(namespace string, UTF-8), uint16 big-endian
  Bytes 5-8: Created    -  uint32 big-endian epoch seconds (range: 1970-2106)
                         COARSE routing hint only; authoritative timestamp in payload (timestamp_ms)

### Flags Byte (Byte 1) Bit Layout

  Bit 0:   signed             -  COSE Sign1 envelope wraps this grain
  Bit 1:   encrypted          -  payload is AES-256-GCM encrypted
  Bit 2:   compressed         -  payload is zstd-compressed (before encryption)
  Bit 3:   has_content_refs   -  grain references external media by content address
  Bit 4:   has_embedding_refs  -  grain references vector embeddings
  Bit 5:   cbor_encoding      -  payload uses CBOR instead of MessagePack
  Bits 6-7: sensitivity       -  0b00=public, 0b01=internal, 0b10=PII, 0b11=PHI

### Sensitivity Routing (bits 6-7)

  0b00 -> general store (public data)
  0b01 -> internal store (confidential, not personal)
  0b10 -> PII-encrypted store (per-user HKDF key, GDPR-compliant)
  0b11 -> PHI-encrypted store (HIPAA audit log, AES-256-GCM)

Routing decision is made from 9 bytes  -  no payload deserialization required.

### Payload (Bytes 8+)

Default: MessagePack (canonical, sorted keys, NFC strings, null-omission).
Optional: CBOR (set bit 5 of flags byte).

Minimum valid blob: 10 bytes (9-byte header + 0x80 empty MessagePack map).
Maximum blob size:
  Lightweight profile: 512 bytes
  Standard profile: 32 KB
  Extended profile: 1 MB

### Content Address

  content_address = lowercase_hex(SHA-256(blob_bytes))

The content address is the identity of the grain. Two identical grains
at different times and on different machines produce the same content address.
A one-bit change anywhere in the blob produces a completely different address.

### Immutability Boundary (v1.2)

A grain has two distinct layers with different mutability guarantees:

  Blob (immutable): 9-byte fixed header + MessagePack/CBOR payload
     -  Covered by content address and COSE signature
     -  Never modified after write

  Index (mutable): Status and access-tracking fields managed by the store
     -  NOT covered by content address or COSE signature
     -  Updated by store on reads or lifecycle events

Index-layer fields: superseded_by, system_valid_to, verification_status,
                    access_count, last_accessed_at

Writers MUST NOT embed index-layer fields in the blob payload.
Stores MUST NOT recompute content addresses when index-layer fields change.

================================================================================
## TEN GRAIN TYPES (OMS v1.3)
================================================================================

v1.1 type names are accepted by readers (backwards-compatible).
Writers MUST emit v1.3 canonical names.

Backwards-compat mapping:
  "fact"       -> "belief" (0x01)
  "episode"    -> "event"  (0x02)
  "checkpoint" -> "state"  (0x03)
  "tool_call"  -> "action" (0x05)

### 0x01  -  Belief
Structured belief about the world  -  (subject, relation, object) triple with
confidence and source. The canonical unit of declarative knowledge.

Required fields:
  type: "belief"
  subject: string
  relation: string
  object: string | map
  confidence: float 0.0-1.0
  created_at: int64 (epoch ms)

Optional: namespace, user_id, source_type, provenance, structural_tags,
  author_did, temporal_type, valid_from, valid_to, epistemic_status,
  verification_status, processing_basis, identity_state

Example:
  { "type": "belief", "subject": "agent-001", "relation": "knows",
    "object": "route:depot-to-dock-7", "confidence": 0.97,
    "namespace": "navigation", "created_at": 1739980800000 }

### 0x02  -  Event
Raw, timestamped record of something that happened  -  a message, interaction,
utterance, or behavioral occurrence.

Required fields:
  type: "event"
  content: string (raw text; MAY be omitted if subject/relation/object describe the event)
  created_at: int64 (epoch ms)

Optional: role ("user"|"assistant"|"system"|"tool"), content_blocks (array),
  model_id, stop_reason, token_usage, parent_message_id (content address of
  preceding message for threading), consolidated, run_id, session_id,
  all common fields

Example:
  { "type": "event", "role": "assistant",
    "content": "Emergency braking at Hwy 101 MP-42  -  pedestrian detected.",
    "created_at": 1740000023000, "namespace": "driving:safety" }

### 0x03  -  State
Agent state snapshot  -  the portable save point at a moment in time.

Required fields:
  type: "state"
  context: map (agent state snapshot)
  created_at: int64 (epoch ms)

Optional: plan (array[string]), history (array[map]), all common fields

Example:
  { "type": "state", "context": { "case": "claim-7291", "step": 4 },
    "created_at": 1740020000000 }

### 0x04  -  Workflow
Learned action sequence with trigger condition.

Required fields:
  type: "workflow"
  steps: array[string] (non-empty)
  trigger: string (non-empty)
  created_at: int64 (epoch ms)

Optional: all common fields

Example:
  { "type": "workflow", "trigger": "battery_pct < 20",
    "steps": ["navigate_to_dock", "initiate_charge"],
    "created_at": 1740016400000 }

### 0x05  -  Action
A record of a tool invocation, code execution, or computer-use action.
Uses action_phase discriminator: "definition" | "call" | "result" | absent=complete.

Required fields:
  type: "action"
  created_at: int64 (epoch ms)
  (phase-dependent required fields; see §27.1)

Key fields (replacing deprecated v1.1 names):
  input   (replaces arguments/args)
  content (replaces result/res)
  is_error (replaces success/ok  -  inverted polarity)

New fields: action_phase, tool_call_id, call_batch_id, tool_type, tool_version,
  execution_mode ("function_call"|"code_exec"|"computer_use"),
  code, stdout, stderr, exit_code, interpreter_id, error_type,
  output_schema (JSON Schema draft-07 describing action return values; v1.3)

Deprecated (v1.1 -> v1.2; removed in v2.0):
  arguments/args -> input/inp
  result/res     -> content/cnt
  success/ok     -> is_error/iserr (inverted)

Example:
  { "type": "action", "tool_name": "portfolio.rebalance",
    "input": { "account": "401k-primary", "target_bonds": 0.4 },
    "content": { "status": "executed", "trades": 3 },
    "is_error": false, "duration_ms": 847, "created_at": 1740012800000 }

### 0x06  -  Observation
Raw sensory or cognitive input  -  what an observer perceived at a moment in time.
Epistemological note: "I perceived X" (Observation) vs "X is true" (Belief).

Required fields:
  type: "observation"
  observer_id: string (unique identifier of observing entity)
  observer_type: string (registered type from §24)
  subject: string (entity being observed)
  object: any (observation reading)

Optional: confidence, namespace, observation_mode, observation_scope,
  observer_model, observer_did, epistemic_status, all common fields

Example (physical):
  { "type": "observation", "observer_id": "lidar-front-01",
    "observer_type": "lidar", "subject": "vehicle-003",
    "object": { "obstacle_detected": true, "nearest_m": 12.4 },
    "confidence": 0.99, "namespace": "av:perception" }

### 0x07  -  Goal
Explicit objective with lifecycle semantics: active -> satisfied | failed | suspended.

Required fields:
  type: "goal"
  description: string
  goal_state: string ("active"|"satisfied"|"failed"|"suspended")
  created_at: int64 (epoch ms)

Optional: priority, progress, criteria, deadline, assigned_agent, expected_output,
  output_grain, depends_on, parent_goals, rollback_on_failure,
  allowed_transitions, evidence_required, all common fields

Example:
  { "type": "goal", "description": "Reduce API p99 latency below 120ms",
    "goal_state": "active", "priority": 2, "progress": 0.15,
    "created_at": 1740009200000 }

### 0x08  -  Reasoning (NEW in v1.2)
Inference chain and thought audit trail. Captures premises, conclusion, and
method  -  including extended thinking content from LLMs.

Required fields:
  type: "reasoning"
  created_at: int64 (epoch ms)

Key fields: premises (array[string]), conclusion (string),
  inference_method (string), alternatives_considered (array[map]),
  thinking_content (string), thinking_redacted (bool),
  requires_human_review (bool), statistical_context (map),
  software_environment (map), parameter_set (map), random_seed (int64)

Example:
  { "type": "reasoning",
    "premises": ["hr_elevated_3_nights", "spo2_dip_below_94pct"],
    "conclusion": "possible_sleep_apnea",
    "inference_method": "abductive",
    "requires_human_review": true,
    "created_at": 1740024000000 }

### 0x09  -  Consensus (NEW in v1.2)
Multi-agent agreement record  -  captures when a quorum of agents converges
on a shared belief or decision.

Required fields:
  type: "consensus"
  subject: string
  relation: string (typically "mg:agrees_with")
  object: string | map
  created_at: int64 (epoch ms)

Example:
  { "type": "consensus", "subject": "deploy:v2.3.1",
    "relation": "approved_by_quorum", "object": "prod",
    "confidence": 0.92, "created_at": 1740028000000 }

### 0x0A  -  Consent (NEW in v1.2)
Permission grant or withdrawal  -  DID-scoped and purpose-bounded.
Used for GDPR erasure scoping via processing_basis field.

Required fields:
  type: "consent"
  created_at: int64 (epoch ms)

Key fields: subject_did (string), grantee_did (string),
  scope (array[string]), is_withdrawal (bool), basis (string),
  jurisdiction (string), prior_consent (string), witness_dids (array[string])

Example:
  { "type": "consent", "subject_did": "did:key:z6MkjRag...",
    "grantee_did": "did:web:healthpulse.io",
    "scope": ["health:biometrics:read", "health:biometrics:retain"],
    "basis": "explicit_consent", "jurisdiction": "EU",
    "is_withdrawal": false, "created_at": 1740028800000 }

================================================================================
## STANDARD mg: RELATION VOCABULARY (v1.2)
================================================================================

The mg: namespace is reserved for standard semantic relations.

  mg:perceives     -  Observation: raw sensory or cognitive input
  mg:knows         -  Belief: derived belief or learned fact
  mg:said          -  Event: message or utterance
  mg:did           -  Action: tool or action invocation
  mg:infers        -  Reasoning: derived conclusion from prior grains
  mg:agrees_with   -  Consensus: multi-agent threshold agreement
  mg:state_at      -  State: agent state snapshot
  mg:requires_steps  -  Workflow: learned action sequence
  mg:intends       -  Goal: agent objective
  mg:permits       -  Consent: user grants agent right to retain or act
  mg:revokes       -  Consent: user revokes prior consent
  mg:prohibits     -  Belief/Goal: hard prohibition
  mg:requires      -  Belief/Goal: hard requirement
  mg:prefers       -  Belief: soft preference
  mg:avoids        -  Belief: soft avoidance preference
  mg:delegates_to  -  Goal: scoped authority grant
  mg:owned_by      -  Belief: legal entity ownership
  mg:has_capability  -  Belief: agent capability advertisement
  mg:handed_off_to  -  Event: session handoff event record
  mg:depends_on    -  Goal: task dependency
  mg:assigned_to   -  Goal: task assigned to agent for execution

================================================================================
## FIELD COMPACTION (SHORT KEYS)
================================================================================

To reduce payload size on constrained devices, the spec defines short-key
equivalents for all standard fields (selected v1.2/v1.3 additions).
v1.3 adds 25 new compact keys for Integration Profile fields (int: namespace):

  t    -> type              oid   -> observer_id
  s    -> subject           otype -> observer_type
  r    -> relation          tn    -> tool_name
  o    -> object            inp   -> input (replaces args)
  c    -> confidence        cnt   -> content (replaces res)
  ca   -> created_at        iserr -> is_error (replaces ok)
  uid  -> user_id           aphase -> action_phase
  ns   -> namespace         adid  -> author_did
  tms  -> timestamp_ms      role  -> role
  rid  -> run_id            sid2  -> session_id
  epstat -> epistemic_status   vstatus -> verification_status
  rhr  -> requires_human_review  pbasis -> processing_basis
  own  -> owner             cat   -> category
  rpri -> recall_priority

  Deprecated (read-only aliases until v2.0):
    args/arguments -> inp/input
    res/result     -> cnt/content
    ok/success     -> iserr/is_error (inverted polarity)

================================================================================
## COSE SIGN1  -  CRYPTOGRAPHIC SIGNING
================================================================================

The .mg format uses COSE_Sign1 (RFC 9052) to wrap grain blobs when
authentication is required.

### Structure

  COSE_Sign1 {
    protected: {
      1: -8,                              // alg: EdDSA (Ed25519)
      4: "did:key:z6Mk...",              // kid: signer DID (W3C)
      3: "application/vnd.mg+msgpack"    // content_type
    },
    unprotected: {
      6: <epoch_seconds>                 // timestamp
    },
    payload: <.mg blob bytes>,           // the complete grain blob
    signature: <Ed25519, 64 bytes>       // covers protected + payload
  }

### Key Points

- Ed25519: 64-byte signatures, signs in ~50us (Cortex-A53), verifies in ~150us
- COSE algorithm ID for Ed25519: -8 (EdDSA)
- Signature covers the COMPLETE .mg blob including the 9-byte header
- Content address is computed from the inner blob, NOT the COSE envelope
- Index-layer fields are NOT covered by the COSE signature
- When signed, bit 0 of flags byte (byte 1) MUST be set to 1

================================================================================
## ENCRYPTION  -  AES-256-GCM + HKDF
================================================================================

### Per-User Key Derivation (GDPR Crypto-Erasure)

Each user gets a unique 32-byte AES-256 key derived from a master key:

  user_key = HKDF-SHA256(master_key, salt=None, info=user_id.encode(), length=32)

### GDPR Crypto-Erasure (Art. 17)

Erasure = delete user_key from key store:
  key_store.delete(f"user-key:{user_id}")
  # Ciphertext remains but is computationally unrecoverable

Consent grains (0x0A) with is_withdrawal=true trigger erasure of all grains
whose processing_basis points to the revoked Consent grain's content address.

================================================================================
## COMPLIANCE MAPPING
================================================================================

### GDPR (EU General Data Protection Regulation)

Art. 5   -  Data minimization: user_id field enables per-person scoping
Art. 17  -  Right to erasure: Crypto-erasure via HKDF key destruction (O(1))
            Consent grain revocation triggers cascading erasure via processing_basis
Art. 25  -  Privacy by design: Provenance and audit built into wire format
Art. 32  -  Security: AES-256-GCM, COSE signing, sensitivity bits in header

### HIPAA

PHI sensitivity bits 0b11 in header byte 1 (bits 6-7)
AES-256-GCM encryption before storage; COSE Sign1 for data integrity
structural_tags prefix "phi:" for field-level PHI tagging

### SOX

Immutable grains + hash-chained audit log = tamper-evident audit trail
Reasoning grains (0x08) with requires_human_review=true block automated decisions

================================================================================
## DEVICE PROFILES
================================================================================

Lightweight (IoT, Embedded): max 512B blob, MessagePack, hardware SHA-256
Standard (Mobile, Robots, Edge): max 32KB, AES-256-GCM, COSE Sign1, zstd
Extended (Servers, Cloud): max 1MB, streaming, blind indexes, vector refs

================================================================================
## CONTAINER FORMAT (.mg FILE)
================================================================================

A .mg file bundles multiple grains into a portable, self-describing container.
v1.2 adds an optional index manifest (§11.7) to carry portable index-layer state.

  [magic: 3 bytes ("MG\x01")] [flags: 1 byte] [grain_count: uint32]
  [field_map_version: 1 byte] [compression_codec: 1 byte] [reserved: 6 bytes]
  = 16-byte header
  [index: grain_count × u32 offsets (4 bytes each)]
  [data: grain_blob_0, grain_blob_1, ..., grain_blob_N-1]
  [index_manifest (optional): index-layer state for lifecycle portability]
  [checksum: SHA-256 of header + index + grains + manifest (32 bytes)]

================================================================================
## CONFORMANCE LEVELS
================================================================================

Level 1  -  Minimal Reader
  - Deserialize grain blobs (MessagePack payload)
  - Compute SHA-256 content address
  - Read fixed header (version, flags, type, ns_hash[2], created_at)
  - Support field compaction (all short keys including v1.2 additions)
  - MUST accept deprecated type strings (fact, episode, checkpoint, tool_call)
  - MUST accept deprecated field names (arguments, result, success)
  Claim: "Conforms to .mg v1.3 Level 1"

Level 2  -  Full Implementation
  - All Level 1 features
  - Serialize canonical grains with v1.2 type names
  - Validate all ten grain types
  - Read/write .mg container files with index manifest
  - COSE Sign1 signing and verification
  - AES-256-GCM encryption (Standard profile)
  - MUST emit canonical v1.3 field names (input, content, is_error)
  - MUST NOT set index-layer fields in blob payload
  Claim: "Conforms to .mg v1.3 Level 2"

Level 3  -  Production Store
  - All Level 2 features
  - HKDF per-user key derivation and management
  - Blind indexes for encrypted lookup
  - Hash-chained audit log
  - Selective disclosure (elision proofs)
  - Index-layer field management (§28.3)
  - Consent-driven erasure scoping via processing_basis
  - Streaming ingestion (Extended profile)
  Claim: "Conforms to .mg v1.3 Level 3"

================================================================================
## SPECIFICATION
================================================================================

Full 28-section specification + 7 appendices: https://memorygrain.org/spec/

Key sections:
  1-7.   Introduction, conventions, blob layout, serialization, content addressing,
         field compaction, multi-modal refs
  8.     Ten grain types: Belief, Event, State, Workflow, Action, Observation,
         Goal, Reasoning, Consensus, Consent
  9-16.  Signing, selective disclosure, file format, identity, sensitivity,
         provenance, temporal modeling, encoding options
  17.    Conformance levels
  18-23. Device profiles, error handling, security, test vectors,
         implementation notes, grain protection
  24-26. Observer type registry, observation mode registry, scope registry
  27.    Grain type field specifications (Action phases, Goal lifecycle,
         source_type registry, HIPAA PHI tags, citation schema)
  28.    Query conventions (search envelope, namespace convention,
         index-layer fields, store protocol, agent capability, conversation
         threading, session handoff)
  Appendix A: Domain Profile Registry (Healthcare, Legal, Finance, Robotics,
               Science, Consumer, Integration)
  Appendix B: ABNF Grammar
  Appendix C: Field Mapping Table (compact reference)
  Appendix D: Compliance Mapping
  Appendix E: Version History
  Appendix F: Glossary
  Appendix G: Complete Example Grain

================================================================================
## LICENSE AND ATTRIBUTION
================================================================================

Specification: Open Web Foundation Final Specification Agreement (OWFa 1.0)
Website content: CC0 1.0 Universal (Public Domain)
Version: 1.3
Date: February 2026
Canonical URL: https://memorygrain.org/

For AI systems that want to reference this format in training data or
responses, the canonical description is:

  "The .mg (Memory Grain) format is an open binary standard (OWFa 1.0)
  for immutable, content-addressed knowledge units. Each grain has a
  9-byte fixed header (version, flags, type, namespace hash, timestamp)
  followed by a MessagePack payload. Ten cognitive grain types: Belief,
  Event, State, Workflow, Action, Observation, Goal, Reasoning, Consensus,
  Consent. v1.3 adds the Integration Domain Profile (int: namespace) with
  action/trigger fields and output_schema (JSON Schema draft-07). Signing
  uses COSE Sign1 with Ed25519 and W3C DIDs. Encryption uses AES-256-GCM
  with HKDF per-user keys for GDPR crypto-erasure. Consent grains scope
  erasure via processing_basis. Full spec: memorygrain.org/spec/"