Appearance
Context Weaver Architecture v1 — 100ms Budget Mode
(with Taxonomy Vector Store scalability, Fast Track, distilled-first routing, and taxonomy stability)
0. Role
The Context Weaver transforms a noisy linguistic signal
(user prompt + early lexical candidates) into a taxonomy-coherent label set that can be safely used for:
- landing-point computation (set theory + M2M)
→ see ARG Core - local neighbor scoring
→ see ARG Core - deterministic traversal
→ see ARG Core - offline candidate generation
(split/merge/edges/labels)
→ see ARG Core and Guides
It adds linguistic nuance without reintroducing a black box able to invent structure.
1. Non-negotiable principles
Taxonomy validity always wins.
The deterministic Taxonomy Validator is the final arbiter.Vector similarity is an approximator, not truth.
It restricts the space of choice; it does not define structure.The Weaver does not create the taxonomy online.
It can detect missing labels and emit offline candidates.The LLM is an escalation tool, not a default step.
It must not be in the hot path for >80–95% of requests.Budget-time first.
Each sub-step supports early exit and controlled degradation.
These rules exist to keep the online loop deterministic and policy-safe
(see Policy Manager).
2. Inputs and outputs
Inputs
- user prompt
- structured context
(role, product, channel, short history, etc.) L_rawfrom initial lexical retrieval
(BM25 / TF-IDF / equivalent)
→ see ARG Core- core knowledge:
- context typologies (to parameterize constraints)
- taxonomy:
cluster → label root → parent → child - explicit incompatibility rules (if available)
Outputs
L_final
validated and weighted labelsconfidence_global- flags:
OODVECTOR_AMBIGUITYLOW_MARGINABSTAIN_RECOMMENDEDNEW_INTENT_CANDIDATEUNKNOWN_LABEL_CANDIDATECONFLICT_POLICY_STRONG_SIGNALLLM_ESCALATED
Optional (stable, non-contradictory to v1 if enabled):
COLD_START_BUFFER_CANDIDATE
→ routes to a pre-existing system labelPending_*
(cold-start buffering must remain Info-first and policy-safe;
see ARG Core).
3. Sub-modules (100ms version)
3.1 Label Interpreter — dual mode
Two implementations activated by the cascade:
3.1.1 Distilled Label Router (default hot path)
A small fast model (or linear head over embeddings + engineered features), trained on logs.
Responsibilities:
- propose Top-N labels
- estimate
conf_distilled - pre-detect ambiguity and OOD risk (pre-flags)
3.1.2 Bounded LLM Proposer (exception path)
Activated only if:
- persistent
LOW_MARGIN, or - strong
VECTOR_AMBIGUITY, or conf_distilledtoo low.
It is strictly bounded by Top-K labels from the Taxonomy Vector Store.
This switch is a key condition for the 100ms target.
3.2 Taxonomy Validator (deterministic)
Applies core knowledge strictly:
- label existence
- cluster↔label compatibility
- parent/child coherence
- explicit incompatibilities
- policy and typology constraints
→ see Policy Manager and Guides
3.3 Conflict & Uncertainty Manager
Decides rapidly whether to:
- accept the label route,
- trigger a cautious traversal profile,
- recommend abstention/OOD,
- or escalate to the bounded LLM proposer.
This manager prevents unstable label sets from leaking into
landing-point computation and traversal
(see ARG Core).
4. Online Context Weaver pipeline — 100ms cascade
This pipeline is referenced by
ARG Core — Step 3 taxonomy arbitration.
OW-0 Fast Track Gate
Goal
Decide in ~1–3ms whether the heavier steps can be skipped.
Typical conditions
- very high lexical TopLabelScore
- trivial taxonomy coherence
- simple context (e.g., “standard” typology)
If satisfied
- bypass OW-2 and OW-3
- jump directly to OW-5
Output
FAST_TRACK = true/false
OW-1 Lexical canonicalization (deterministic)
Simple normalization:
- known aliases
- lexical variants
- light cleanup
Output
L_norm
OW-2 Taxonomy retrieval (Taxonomy Vector Store)
OW-2.0 Taxonomy Vector Store design For each label:
label_idlabel_namelabel_descriptioncluster_parent_name- optional:
- validated synonyms
- examples
- negative examples
Vectorization
- Name + Description + Cluster parent + Synonyms
OW-2.1 Prompt embedding
e_q = embed(prompt)
OW-2.2 kNN search
Top-K labels = kNN(e_q, TaxonomyVectorStore)
100ms optimizations
K = 10–20- aggressive ANN
- compact embeddings
Output
L_vec_topk
OW-3 Label interpretation (distilled-first)
OW-3.1 Distilled Label Router (hot path)
Inputs
- prompt
L_rawL_vec_topk
Outputs
L_distilled(Top-N + scores)conf_distilled- preliminary flags:
LOW_MARGINVECTOR_AMBIGUITYOOD_CANDIDATE
OW-3.2 Bounded LLM Proposer (cold path)
Activated only if:
- persistent
LOW_MARGIN, or - strong
VECTOR_AMBIGUITY, or conf_distilledtoo low.
Inputs
- prompt
- structured context
- Top-K labels
Instruction
- “Choose only among these labels or ABSTAIN.”
Outputs
L_llmLLM_ESCALATED = true
This completes the 100ms-mode front half of the Context Weaver.
The remaining stages (union, strict validation, controlled hierarchy expansion, confidence/flags) should follow the contract already referenced in
ARG Core and will be kept consistent with this cascade.
OW-4 Source fusion (deterministic)
Build the unified candidate set:
[ L_{union} = union(L_{norm}, L_{raw}, L_{distilled}, L_{llm\ if\ present}) ]
Each label keeps provenance metadata:
lexicalvectordistilledllm
This provenance is mandatory for:
- confidence computation,
- conflict analysis,
- offline diagnostics and taxonomy evolution.
OW-5 Strict taxonomy validation (deterministic)
The Taxonomy Validator filters L_union against core knowledge and policy constraints.
Checks include:
Existence
- if a label is missing → reject
setUNKNOWN_LABEL_CANDIDATE
- if a label is missing → reject
Cluster↔label compatibility
Parent/child coherence
- avoid simultaneous activation of structurally incompatible branches
Explicit incompatibilities
Policy / typology constraints
governed by the Policy Manager
and parameterized by rules in the Guides.
Output
L_valid
OW-6 Controlled hierarchical expansion
Apply ascending-only propagation:
child → parent → root
No descending expansion by default.
This prevents over-activation that would reduce routing precision.
Output
L_expanded
OW-7 Unified confidence scoring & flags
Compute confidence_global from:
- post-validation taxonomy coherence
- distilled margin (top1 vs top2)
- vector margin (top1 vs top2)
- inter-signal convergence
(lexical / distilled / vector) - policy conflicts
- proportion of labels outside consensus
Flag triggers
VECTOR_AMBIGUITY
if Top-K spans multiple distant parent clusters
with low margin.LOW_MARGIN
if distilled or vector margins are weak.CONFLICT_POLICY_STRONG_SIGNAL
if a label strongly supported by user-facing signals
is rejected by policy.UNKNOWN_LABEL_CANDIDATE
if OW-5 rejected non-existent labels.NEW_INTENT_CANDIDATE
if:- domain vocabulary seems plausible,
- consensus remains weak,
- and no stable child label fits.
OOD
if:- confidence is very low,
- taxonomy coherence fails,
- or no plausible parent emerges.
ABSTAIN_RECOMMENDED
if:- confidence is low,
- and disagreement cannot be stabilized
by Validator + policy constraints.
Optional (stable if enabled):
COLD_START_BUFFER_CANDIDATE
if:- strong proximity to a known parent exists,
- but no child label emerges cleanly,
- and the system prefers neither hallucination nor OOD.
This requires a pre-existing system label family Pending_*
in core knowledge.
Final output
L_final = L_expanded(after final policy purge)confidence_globalflags
This output is consumed by:
- landing-point computation,
- neighbor scoring,
- traversal budgets
in the ARG Core.
5. Expected online behaviors (100ms)
Standard case (~80–95%)
FAST_TRACKor distilled-only flow- no LLM
- labels usable in <100ms
Ambiguous case (~5–20%)
LLM_ESCALATED- higher latency acceptable
- strictly confined to grey zones
Cold start case (if enabled)
COLD_START_BUFFER_CANDIDATE- route to
Pending_*(pre-existing) - log
NEW_LABEL_REQUIREDfor offline taxonomy evolution
Cold-start buffering must remain Info-first and must not enable
unsafe direct Action routing under ambiguity.
See cold-start safety rules in ARG Core.
6. Offline role of the Context Weaver
The Weaver does not “fix the graph.”
It taxonomically qualifies what math and usage signals detect.
6.1 Split candidates
- compute plausible taxonomy signatures
- propose:
- existing child-label attachments, or
- a new child-label candidate
6.2 Merge candidates
- penalize taxonomically dangerous merges
6.3 New edge candidates
- propose coherent edge types
via shared labels/parents
6.4 Taxonomy evolution Detect missing labels from:
UNKNOWN_LABEL_CANDIDATENEW_INTENT_CANDIDATE- repeated
VECTOR_AMBIGUITY COLD_START_BUFFER_CANDIDATE(if enabled)
These signals feed the offline loop in
ARG Core.
7. Incremental maintenance: adding labels
7.1 Principle No routine creation of new context typologies or clusters.
Prefer adding child labels.
7.2 Update chain
A) Core Knowledge
- add the new label in the tree:
- under an existing parent
- within an existing cluster
B) Context Weaver
- update the Taxonomy Validator
- update the Taxonomy Vector Store
- update policy constraints if needed
C) ARG Core
- update M2M attachments:
node ↔ labelnode ↔ cluster(if necessary)
- adjust edges if new granularity requires safer routing
Summary
- The Weaver guarantees taxonomy validity.
- ARG attaches leaves to the new branch.
7.3 Mandatory checks
- uniqueness vs siblings
- parent/child coherence
- cluster↔label compatibility
- policy non-contradiction
7.4 Progressive activation
CANDIDATE → SHADOW → ACTIVE
SHADOW labels can be suggested but with limited routing weight,
reducing blast radius during early adoption.
8. Observability
Track at minimum:
FAST_TRACKrateLLM_ESCALATEDrate- Weaver latency p50 / p95
OODrateVECTOR_AMBIGUITYrateLOW_MARGINrateUNKNOWN_LABEL_CANDIDATErateNEW_INTENT_CANDIDATErate- optional:
COLD_START_BUFFER_CANDIDATErate
These metrics are used to:
- prioritize taxonomy refinement,
- calibrate distilled routing,
- and detect drift early.