Appearance
Using ARG for memory agents
0 Objective of this guide
Audience
This guide is written for:
- product builders who want an agent that remembers reliably (without drifting)
- platform teams who need auditable, policy-safe memory writes
- agent engineers who want granular, deduplicated long-term memory built on the ARG graph
What you are building
You are building an agent that can:
- maintain an ephemeral Working Set at request time,
- emit episodic memory writes safely (no structural edits online),
- consolidate episodic signals offline into stable, versioned long-term memory (without duplicates and without silent poisoning).
This guide assumes you already have retrieval wired (USER / DOMAIN / EXTERNAL) and that memory is a post-reasoning capability, not a substitute for retrieval.
Out of scope
This guide does not re-explain:
- taxonomy arbitration internals
See: Context Weaver (100ms) - full traversal mechanics
See: ARG Core Step 7 Traversal - full governance kernel internals
See: Policy Manager v1
1 Core principles (non-negotiable)
Online is read-only for structure.
No nodes/edges/labels/clusters are created or modified at request time. Memory writes are episodic and land in a registry/log layer.The graph stores refined chunks at the node level.
The unit of memory is always a refined chunk attached to a node (with provenance).Policy governs memory writes.
The Policy Manager Memory Guard is mandatory before any write.
See: Policy Memory GuardTaxonomy coherence is the relevance certificate.
No semantic memory write is allowed without taxonomy-coherent labels from the Context Weaver.Dedup is about InfoUnits, not about text.
Multiple chunks can express the same fact; the system deduplicates the fact unit (InfoUnit), not the string.
2 Memory primitives (what exists in ARG)
2.1 Working Set (short-term, ephemeral)
A request-time subsystem (not graph structure) storing:
L_finaland typologies used for routing,- retrieval bindings (which nodes/bundles were used),
- cache keys / bundle IDs / decision traces,
- an optional structured summary for long conversations (taxonomy-aligned).
Purpose:
- avoid redoing retrieval when conversation exceeds the LLM window,
- keep traceability (“what was actually used”) without rewriting the chat.
The Working Set is not long-term memory. It expires by TTL / session policy.
2.2 Episodic memory write (online)
An online write is allowed only as an episode:
- it stores what happened / what was observed / what was used,
- it can attach refined chunks as evidence,
- it must not promote external facts into stable nodes online.
Reference:
2.3 Info Registry (unitary memory layer)
A subsystem/index (not a node) responsible for:
- canonicalization (when possible),
- deduplication into InfoUnits,
- attaching evidence chunks to the correct unit,
- staging “uncertain equivalence” for offline review.
Reference:
2.4 External Bundle Log (audit, optional)
External retrieval returns an ephemeral bundle used for grounding. Online it may be logged for audit, but:
- cannot become nodes/labels/edges online,
- cannot be promoted into semantic memory online.
Reference:
3 What to save (and what not to save)
3.1 Save (typical)
User memory (USER scope):
- stable preferences relevant to the product/domain,
- repeated constraints (formats, language, reporting style),
- user-specific configuration facts used to answer correctly.
Domain / company memory (DOMAIN scope):
- stable business facts the agent must represent (products, processes, constraints),
- operational “known issues” / incident patterns,
- platform configurations (summarized), when allowed.
Execution traces (episodic):
- which nodes/chunks were used,
- which connector results were used (as bundles),
- confidence/flags and policy state.
3.2 Do not save
- out-of-scope content (even if the user insists),
- sensitive content blocked by policy (PII, secrets, disallowed categories),
- raw external documents as long-term memory,
- hallucinated facts or low-confidence “guesses”.
Memory guardrails are enforced by:
4 When to save (online triggers)
A memory write is considered only after a response is produced (or after an action outcome is known).
4.1 Common triggers
- the user provides a stable domain fact that will be reused,
- the system needs the fact to answer correctly in the future,
- a repeated preference is detected across turns,
- the agent used an external bundle that should be auditable later,
- the interaction reveals a recurring gap (candidate for offline consolidation).
4.2 Anti-trigger (do not write)
confidence_globalis low and labels are unstable (ABSTAIN_RECOMMENDED,OOD),- the memory would be outside the agent’s domain,
- the write would violate scope constraints.
Weaver outputs and confidence/flags:
5 How online memory writes work (safe-by-design)
5.1 Pipeline (request time)
Policy gating
Apply Memory Guard constraints.
See: Policy Memory GuardLabel certificate
Require taxonomy-coherent labelsL_finalfrom the Context Weaver.Unitary fact extraction (bounded)
Extract one unit of information (or a small bounded set) as a refined chunk, with provenance and confidence.Info Registry dedup
Map the fact into an InfoUnit or attach as new evidence.Episodic write commit
Store the episodic record (and evidence chunk reference) without mutating the active graph.
Reference:
5.2 Uniqueness: InfoKey (deterministic foundation)
Primary uniqueness key:
Scope ∈ {User, Domain}(Domain covers enterprise/project knowledge in your protocol)InfoTypeis a project-defined family (Incident, Preference, Config, etc.)LabelSignatureis derived from LabelIDs (never from label path strings)
Two allowed LabelSignature modes (v1):
PrimaryLabelID (default)
The Weaver chooses a canonical label fromL_final_ids.sorted LabelID_set (optional)
Used only if the project explicitly treats multi-label combos as unique keys.
This preserves single-source-of-truth (LabelIDs) while allowing evolution of the taxonomy tree without breaking identity.
5.3 Evidence chunks (granular + scalable)
When a new chunk matches an existing InfoUnit:
- attach it as evidence (do not create a duplicate unit),
- keep provenance (who/when/how it was observed),
- optionally update reliability metrics (preferably offline).
6 Short-term vs long-term memory (what changes offline)
ARG separates ephemeral continuity state from stable knowledge.
Short-term memory is the Working Set: an in-memory / cache layer with strict TTL. It stores request/session continuity state and taxonomy-aligned structured items. It MUST NOT mutate graph structure online.
Long-term memory is the result of offline consolidation: episodic signals are deduplicated into InfoUnits and promoted under governance (versioned, replay-tested, policy-checked).
6.0 Routing contract (Context Weaver → Working Set)
The Working Set is indexed and selected via the Context Weaver output.
- Identity keys MUST come from the Weaver:
L_final_ids,PrimaryLabelID,LabelSignature
- Eligibility follows the same set-theoretic binding used in online retrieval:
- The Working Set MUST provide a bounded State Snapshot aligned to
L_final, not a free-form prose recap. Evidence pointers MAY be attached for audit/debug.
6.1 Working Set (short-term) — Taxonomy-First State Store
Goal: The Working Set is not a free-form conversation summary. It is a taxonomy-aligned state store optimized for continuity, correctness, and relevance under strict TTL.
Key principles
- In-memory / cache only (ephemeral): no long-term commitments here.
- TTL-controlled per item; refresh only on explicit reconfirmation or task-reuse.
- Taxonomy-first: every stored item must map to an existing taxonomy type + schema.
- Evidence-backed: each item must reference evidence pointers (messages/tool outputs).
- No online structural mutation: unknown concepts are stored as
Unknown/Othervalues, not as new taxonomy types. - Bounded: enforce caps per category and per taxonomy type (count + token budget).
Working Set layout (3 TTL collections)
- Facts (Assertions)
- Purpose: store stable-ish, immediately useful facts, preferences, constraints, and decisions.
- Shape (recommended fields):
idtaxon(taxonomy type; required)slot(optional sub-type/field name)value(normalized / canonical form when possible)confidence(0..1)scope(request|session) (still short-term; used for arbitration)ttl_expires_atevidence[](pointers: message ids, tool-call ids, doc spans)tags[](optional: e.g.,constraint,preference,decision)last_used_at(for eviction scoring)
- Agenda (Intent + Open Loops)
- Purpose: keep the “what are we doing” thread without re-reading chat history.
- Fields:
goal(taxonomy-aligned)open_questions[](taxonomy-aligned, each withevidence)next_actions[](each withpreconditions,risks,evidence)risk_flags[](policy/ambiguity/compliance flags)
- Context Pack (Entities + Relations + Constraints)
- Purpose: a lightweight hot graph of active entities and their typed relations.
- Shape:
entities[]:{ entity_id, taxon, canonical_name, aliases[], ttl_expires_at, evidence[] }relations[]:{ src_id, rel_typed, dst_id, confidence, ttl_expires_at, evidence[] }
Ingestion pipeline (per turn)
- Extract → Normalize
- Extract candidate items (entities, constraints, preferences, goals, decisions).
- Normalize to allowed taxonomy types + canonical values.
- Validate → Gate
- Reject candidates that fail schema validation or lack evidence pointers.
- If taxonomy mapping fails: store under a safe fallback (
Unknown/Other) without creating new structure.
- Score → Upsert
- Compute
utility_scoreusing:- task relevance (to current goal)
- recency
- explicitness (user explicitly stated vs inferred)
- stability (reconfirmed vs one-off)
- Upsert into Facts / Agenda / Graph with TTL and confidence.
TTL refresh policy (recommended)
- Do not refresh TTL merely because the item is seen in history.
- Refresh TTL only when:
- the user reconfirms it, or
- the agent uses it as a dependency for current goal execution.
- On conflict (same
taxon+slotdifferent values):- keep both temporarily with confidence + evidence,
- mark
conflict=trueand push toAgenda.open_questionsif resolution is needed.
Retrieval for prompting (context assembly)
Provide a State Snapshot instead of a prose recap:
Goal + top constraintsTop-K Facts(by utility_score)Hot entities + key relationsOpen questions + next actions- Include evidence pointers only when needed (debug/audit or high-stakes).
Boundedness & eviction
- Hard caps (example):
- Facts: max N per taxon (e.g., 20), global max (e.g., 100)
- Entities: max (e.g., 50), Relations: max (e.g., 80)
- Agenda: max open questions (e.g., 10)
- Eviction rule: drop lowest
utility_score, never exceed budgets. - Always expire items past
ttl_expires_at(no exceptions).
Observability (recommended)
- Track:
- coverage by taxonomy type
- conflicts created/resolved
- eviction counts by type
- stale items (expired but still referenced)
- Emit lightweight logs for debugging and audits.
6.2 Long-term memory (stable)
Long-term memory is the result of offline consolidation:
- dedup across episodes,
- contradiction handling,
- controlled promotion from SHADOW → ACTIVE,
- versioned migrations and redirects.
This is part of the offline loop:
7 External knowledge and memory (do it safely)
7.1 Online rule (hard)
External retrieval produces an ExternalContextBundle:
- used for grounding in the LLM context,
- optionally logged episodically for audit,
- never promoted into nodes/labels/edges online.
References:
7.2 Offline eligibility
External bundles may trigger offline proposals:
- new Info nodes (internalized and normalized),
- new labels (child-label-first),
- connector refinements (sources/filters/budgets).
8 Offline consolidation (from episodic to stable)
8.1 Consolidation candidates
Offline merges:
- episodic InfoUnits + evidence chunks,
- external bundle logs (audit + candidate generation),
- outcomes/feedback (strong vs weak signals).
8.2 Dedup + contradictions
Offline must:
- cluster similar InfoUnits when identity is uncertain,
- detect contradictions (and keep both as competing hypotheses if needed),
- avoid “winner by repetition” (slow poisoning protection).
8.3 Promotion: SHADOW → ACTIVE (controlled)
A stable memory promotion must be:
- versioned,
- replay-tested on historical logs,
- policy-checked,
- taxonomy-coherent.
If regression is detected, keep in SHADOW and do not promote.
9 Retention / TTL (project-configurable)
ARG keeps retention policy configurable (policy + product constraints).
Recommended baseline ranges (typical):
- Working Set: minutes → hours (session TTL)
- Episodic logs: weeks → months
- External bundle logs: shorter than episodic (audit needs + compliance)
- Evidence chunks: retained with their InfoUnit lifecycle (subject to policy)
Exact durations should be defined by product requirements (compliance, storage cost, audit needs).
10 Observability (what to log)
At minimum, for every MemoryWrite attempt log:
policy_state+ relevant constraints,L_final+confidence_global+ flags,InfoKeyand chosen LabelSignature mode,- dedup decision:
MATCH/NEW/UNSURE, - provenance: source node / connector / timestamp,
- (optional) structured summary version if updated.
These logs feed:
- offline consolidation,
- taxonomy evolution candidates,
- slow-poisoning detection.
11 Implementation patterns
11.1 Pattern A — Working Set for long conversations
Use the Working Set to store:
- compact structured summary (taxonomy-aligned),
- last retrieval bindings and bundles,
- last stable
L_finaland typologies.
Goal: continuity beyond LLM context limits, without rewriting chat history.
11.2 Pattern B — Safe user preference memory
- only write after repeated detection or explicit user statement,
- store as a deduplicated InfoUnit (Scope=User, InfoType=Preference),
- require Policy Memory Guard pass + taxonomy certificate.
11.3 Pattern C — Domain incident memory (enterprise)
- write episodic incident observations,
- consolidate offline into stable “Known issue / resolution” nodes,
- avoid promotion based solely on silence or repetition.
11.4 Pattern D — External audit bundle
- keep as external bundle logs (ephemeral objects),
- promote only offline after normalization + validation.
12 References (protocol anchors)
- Policy Memory Guard
Policy Manager - Context Weaver online outputs and flags
Context Weaver pipelineOW-7 confidence - Memory write + registry layer
ARG Core Step 10 MemoryWriteARG Core Step 10 Info RegistryARG Core external bundle rules - Offline consolidation and lifecycle
ARG Core Offline loopARG Core Lifecycle