Skip to content

Using ARG for memory agents

ARG: Short-term + Long-term Memory (High-level flow)

0 Objective of this guide

Audience

This guide is written for:

  • product builders who want an agent that remembers reliably (without drifting)
  • platform teams who need auditable, policy-safe memory writes
  • agent engineers who want granular, deduplicated long-term memory built on the ARG graph

What you are building

You are building an agent that can:

  • maintain an ephemeral Working Set at request time,
  • emit episodic memory writes safely (no structural edits online),
  • consolidate episodic signals offline into stable, versioned long-term memory (without duplicates and without silent poisoning).

This guide assumes you already have retrieval wired (USER / DOMAIN / EXTERNAL) and that memory is a post-reasoning capability, not a substitute for retrieval.

Out of scope

This guide does not re-explain:


1 Core principles (non-negotiable)

  • Online is read-only for structure.
    No nodes/edges/labels/clusters are created or modified at request time. Memory writes are episodic and land in a registry/log layer.

  • The graph stores refined chunks at the node level.
    The unit of memory is always a refined chunk attached to a node (with provenance).

  • Policy governs memory writes.
    The Policy Manager Memory Guard is mandatory before any write.
    See: Policy Memory Guard

  • Taxonomy coherence is the relevance certificate.
    No semantic memory write is allowed without taxonomy-coherent labels from the Context Weaver.

  • Dedup is about InfoUnits, not about text.
    Multiple chunks can express the same fact; the system deduplicates the fact unit (InfoUnit), not the string.


2 Memory primitives (what exists in ARG)

2.1 Working Set (short-term, ephemeral)

A request-time subsystem (not graph structure) storing:

  • L_final and typologies used for routing,
  • retrieval bindings (which nodes/bundles were used),
  • cache keys / bundle IDs / decision traces,
  • an optional structured summary for long conversations (taxonomy-aligned).

Purpose:

  • avoid redoing retrieval when conversation exceeds the LLM window,
  • keep traceability (“what was actually used”) without rewriting the chat.

The Working Set is not long-term memory. It expires by TTL / session policy.

2.2 Episodic memory write (online)

An online write is allowed only as an episode:

  • it stores what happened / what was observed / what was used,
  • it can attach refined chunks as evidence,
  • it must not promote external facts into stable nodes online.

Reference:

2.3 Info Registry (unitary memory layer)

A subsystem/index (not a node) responsible for:

  • canonicalization (when possible),
  • deduplication into InfoUnits,
  • attaching evidence chunks to the correct unit,
  • staging “uncertain equivalence” for offline review.

Reference:

2.4 External Bundle Log (audit, optional)

External retrieval returns an ephemeral bundle used for grounding. Online it may be logged for audit, but:

  • cannot become nodes/labels/edges online,
  • cannot be promoted into semantic memory online.

Reference:


3 What to save (and what not to save)

3.1 Save (typical)

User memory (USER scope):

  • stable preferences relevant to the product/domain,
  • repeated constraints (formats, language, reporting style),
  • user-specific configuration facts used to answer correctly.

Domain / company memory (DOMAIN scope):

  • stable business facts the agent must represent (products, processes, constraints),
  • operational “known issues” / incident patterns,
  • platform configurations (summarized), when allowed.

Execution traces (episodic):

  • which nodes/chunks were used,
  • which connector results were used (as bundles),
  • confidence/flags and policy state.

3.2 Do not save

  • out-of-scope content (even if the user insists),
  • sensitive content blocked by policy (PII, secrets, disallowed categories),
  • raw external documents as long-term memory,
  • hallucinated facts or low-confidence “guesses”.

Memory guardrails are enforced by:


4 When to save (online triggers)

A memory write is considered only after a response is produced (or after an action outcome is known).

4.1 Common triggers

  • the user provides a stable domain fact that will be reused,
  • the system needs the fact to answer correctly in the future,
  • a repeated preference is detected across turns,
  • the agent used an external bundle that should be auditable later,
  • the interaction reveals a recurring gap (candidate for offline consolidation).

4.2 Anti-trigger (do not write)

  • confidence_global is low and labels are unstable (ABSTAIN_RECOMMENDED, OOD),
  • the memory would be outside the agent’s domain,
  • the write would violate scope constraints.

Weaver outputs and confidence/flags:


5 How online memory writes work (safe-by-design)

5.1 Pipeline (request time)

  1. Policy gating
    Apply Memory Guard constraints.
    See: Policy Memory Guard

  2. Label certificate
    Require taxonomy-coherent labels L_final from the Context Weaver.

  3. Unitary fact extraction (bounded)
    Extract one unit of information (or a small bounded set) as a refined chunk, with provenance and confidence.

  4. Info Registry dedup
    Map the fact into an InfoUnit or attach as new evidence.

  5. Episodic write commit
    Store the episodic record (and evidence chunk reference) without mutating the active graph.

Reference:

5.2 Uniqueness: InfoKey (deterministic foundation)

Primary uniqueness key:

  • Scope ∈ {User, Domain} (Domain covers enterprise/project knowledge in your protocol)
  • InfoType is a project-defined family (Incident, Preference, Config, etc.)
  • LabelSignature is derived from LabelIDs (never from label path strings)

Two allowed LabelSignature modes (v1):

  1. PrimaryLabelID (default)
    The Weaver chooses a canonical label from L_final_ids.

  2. sorted LabelID_set (optional)
    Used only if the project explicitly treats multi-label combos as unique keys.

This preserves single-source-of-truth (LabelIDs) while allowing evolution of the taxonomy tree without breaking identity.

5.3 Evidence chunks (granular + scalable)

When a new chunk matches an existing InfoUnit:

  • attach it as evidence (do not create a duplicate unit),
  • keep provenance (who/when/how it was observed),
  • optionally update reliability metrics (preferably offline).

6 Short-term vs long-term memory (what changes offline)

ARG separates ephemeral continuity state from stable knowledge.

  • Short-term memory is the Working Set: an in-memory / cache layer with strict TTL. It stores request/session continuity state and taxonomy-aligned structured items. It MUST NOT mutate graph structure online.

  • Long-term memory is the result of offline consolidation: episodic signals are deduplicated into InfoUnits and promoted under governance (versioned, replay-tested, policy-checked).

6.0 Routing contract (Context Weaver → Working Set)

The Working Set is indexed and selected via the Context Weaver output.

  • Identity keys MUST come from the Weaver:
    • L_final_ids, PrimaryLabelID, LabelSignature
  • Eligibility follows the same set-theoretic binding used in online retrieval:
  • The Working Set MUST provide a bounded State Snapshot aligned to L_final, not a free-form prose recap. Evidence pointers MAY be attached for audit/debug.

6.1 Working Set (short-term) — Taxonomy-First State Store

Goal: The Working Set is not a free-form conversation summary. It is a taxonomy-aligned state store optimized for continuity, correctness, and relevance under strict TTL.

Key principles

  • In-memory / cache only (ephemeral): no long-term commitments here.
  • TTL-controlled per item; refresh only on explicit reconfirmation or task-reuse.
  • Taxonomy-first: every stored item must map to an existing taxonomy type + schema.
  • Evidence-backed: each item must reference evidence pointers (messages/tool outputs).
  • No online structural mutation: unknown concepts are stored as Unknown/Other values, not as new taxonomy types.
  • Bounded: enforce caps per category and per taxonomy type (count + token budget).

Working Set layout (3 TTL collections)

  1. Facts (Assertions)
  • Purpose: store stable-ish, immediately useful facts, preferences, constraints, and decisions.
  • Shape (recommended fields):
    • id
    • taxon (taxonomy type; required)
    • slot (optional sub-type/field name)
    • value (normalized / canonical form when possible)
    • confidence (0..1)
    • scope (request | session) (still short-term; used for arbitration)
    • ttl_expires_at
    • evidence[] (pointers: message ids, tool-call ids, doc spans)
    • tags[] (optional: e.g., constraint, preference, decision)
    • last_used_at (for eviction scoring)
  1. Agenda (Intent + Open Loops)
  • Purpose: keep the “what are we doing” thread without re-reading chat history.
  • Fields:
    • goal (taxonomy-aligned)
    • open_questions[] (taxonomy-aligned, each with evidence)
    • next_actions[] (each with preconditions, risks, evidence)
    • risk_flags[] (policy/ambiguity/compliance flags)
  1. Context Pack (Entities + Relations + Constraints)
  • Purpose: a lightweight hot graph of active entities and their typed relations.
  • Shape:
    • entities[]: { entity_id, taxon, canonical_name, aliases[], ttl_expires_at, evidence[] }
    • relations[]: { src_id, rel_typed, dst_id, confidence, ttl_expires_at, evidence[] }

Ingestion pipeline (per turn)

  1. Extract → Normalize
  • Extract candidate items (entities, constraints, preferences, goals, decisions).
  • Normalize to allowed taxonomy types + canonical values.
  1. Validate → Gate
  • Reject candidates that fail schema validation or lack evidence pointers.
  • If taxonomy mapping fails: store under a safe fallback (Unknown/Other) without creating new structure.
  1. Score → Upsert
  • Compute utility_score using:
    • task relevance (to current goal)
    • recency
    • explicitness (user explicitly stated vs inferred)
    • stability (reconfirmed vs one-off)
  • Upsert into Facts / Agenda / Graph with TTL and confidence.

  • Do not refresh TTL merely because the item is seen in history.
  • Refresh TTL only when:
    • the user reconfirms it, or
    • the agent uses it as a dependency for current goal execution.
  • On conflict (same taxon+slot different values):
    • keep both temporarily with confidence + evidence,
    • mark conflict=true and push to Agenda.open_questions if resolution is needed.

Retrieval for prompting (context assembly)

Provide a State Snapshot instead of a prose recap:

  • Goal + top constraints
  • Top-K Facts (by utility_score)
  • Hot entities + key relations
  • Open questions + next actions
  • Include evidence pointers only when needed (debug/audit or high-stakes).

Boundedness & eviction

  • Hard caps (example):
    • Facts: max N per taxon (e.g., 20), global max (e.g., 100)
    • Entities: max (e.g., 50), Relations: max (e.g., 80)
    • Agenda: max open questions (e.g., 10)
  • Eviction rule: drop lowest utility_score, never exceed budgets.
  • Always expire items past ttl_expires_at (no exceptions).

  • Track:
    • coverage by taxonomy type
    • conflicts created/resolved
    • eviction counts by type
    • stale items (expired but still referenced)
  • Emit lightweight logs for debugging and audits.

6.2 Long-term memory (stable)

Long-term memory is the result of offline consolidation:

  • dedup across episodes,
  • contradiction handling,
  • controlled promotion from SHADOW → ACTIVE,
  • versioned migrations and redirects.

This is part of the offline loop:


7 External knowledge and memory (do it safely)

7.1 Online rule (hard)

External retrieval produces an ExternalContextBundle:

  • used for grounding in the LLM context,
  • optionally logged episodically for audit,
  • never promoted into nodes/labels/edges online.

References:

7.2 Offline eligibility

External bundles may trigger offline proposals:

  • new Info nodes (internalized and normalized),
  • new labels (child-label-first),
  • connector refinements (sources/filters/budgets).

8 Offline consolidation (from episodic to stable)

8.1 Consolidation candidates

Offline merges:

  • episodic InfoUnits + evidence chunks,
  • external bundle logs (audit + candidate generation),
  • outcomes/feedback (strong vs weak signals).

8.2 Dedup + contradictions

Offline must:

  • cluster similar InfoUnits when identity is uncertain,
  • detect contradictions (and keep both as competing hypotheses if needed),
  • avoid “winner by repetition” (slow poisoning protection).

8.3 Promotion: SHADOW → ACTIVE (controlled)

A stable memory promotion must be:

  • versioned,
  • replay-tested on historical logs,
  • policy-checked,
  • taxonomy-coherent.

If regression is detected, keep in SHADOW and do not promote.


9 Retention / TTL (project-configurable)

ARG keeps retention policy configurable (policy + product constraints).

Recommended baseline ranges (typical):

  • Working Set: minutes → hours (session TTL)
  • Episodic logs: weeks → months
  • External bundle logs: shorter than episodic (audit needs + compliance)
  • Evidence chunks: retained with their InfoUnit lifecycle (subject to policy)

Exact durations should be defined by product requirements (compliance, storage cost, audit needs).


10 Observability (what to log)

At minimum, for every MemoryWrite attempt log:

  • policy_state + relevant constraints,
  • L_final + confidence_global + flags,
  • InfoKey and chosen LabelSignature mode,
  • dedup decision: MATCH / NEW / UNSURE,
  • provenance: source node / connector / timestamp,
  • (optional) structured summary version if updated.

These logs feed:

  • offline consolidation,
  • taxonomy evolution candidates,
  • slow-poisoning detection.

11 Implementation patterns

11.1 Pattern A — Working Set for long conversations

Use the Working Set to store:

  • compact structured summary (taxonomy-aligned),
  • last retrieval bindings and bundles,
  • last stable L_final and typologies.

Goal: continuity beyond LLM context limits, without rewriting chat history.

11.2 Pattern B — Safe user preference memory

  • only write after repeated detection or explicit user statement,
  • store as a deduplicated InfoUnit (Scope=User, InfoType=Preference),
  • require Policy Memory Guard pass + taxonomy certificate.

11.3 Pattern C — Domain incident memory (enterprise)

  • write episodic incident observations,
  • consolidate offline into stable “Known issue / resolution” nodes,
  • avoid promotion based solely on silence or repetition.

11.4 Pattern D — External audit bundle

  • keep as external bundle logs (ephemeral objects),
  • promote only offline after normalization + validation.

12 References (protocol anchors)

Released under the Apache License 2.0