Skip to content

Design Decisions

Status note: some entries describe legacy alternatives evaluated earlier in the project. Current implementation is Path A with unified provider + dual store.

Every architectural choice in Sonality is backed by specific research findings. This page documents what we chose, what we rejected, and the evidence behind each decision. Each decision is structured as: problem, solution, research backing, and alternative considered.

Implemented Decisions

1. Prompt-Based Personality Over Fine-Tuning

Aspect Detail
Problem How to personalize an LLM's personality without retraining? Fine-tuning requires training data that doesn't exist, risks catastrophic forgetting, and prevents runtime evolution.
Solution RAG-based personalization through system prompt injection. The sponge snapshot and retrieved episodes are injected into the system prompt each turn.
Research RAG achieves 14.92% improvement over baselines vs 1.07% for parameter-efficient fine-tuning (arXiv:2409.09510). Persona Selection Model research (2026) confirms external context-priming can meaningfully steer personality.
Alternative Fine-tuning for personality. Rejected: only 1.07% gain, requires non-existent training data, prevents evolution during deployment. Character.AI attempted fine-tuning and still experienced drift.

2. Evidence Quality Gating (ESS)

Aspect Detail
Problem Without input quality gating, the agent absorbs any user assertion as truth — including social pressure, emotional appeals, and bare assertions.
Solution A dedicated LLM call classifies argument quality (Evidence Strength Score) before personality/belief updates. Runtime gates are multi-factor (classifier reliability + downstream typed provenance decisions), not a single global threshold.
Research Maps to BASIL's (2025) distinction between "sycophantic belief shifts" and "rational belief updating." IBM ArgQ calibration validates the classifier against human-annotated argument quality rankings.
Alternative Update on every interaction. Rejected: bounded confidence models show systems that update on every input converge to consensus or oscillate chaotically (Hegselmann-Krause 2002).

3. Immutable Core Identity

Aspect Detail
Problem Without an anchor, persona drift occurs within 8 rounds (arXiv:2402.10962). The personality system could overwrite fundamental values.
Solution A fixed CORE_IDENTITY block (~200 tokens) injected into every system prompt. The personality system cannot modify it. Defines intellectual honesty, curiosity, independence, explicit disagreement, merit-based evaluation, and sycophancy resistance.
Research "Soul Document" concept from personality AI research and Parametric Identity Layer research (2024) converge on separating immutable core traits from learnable preferences. The core identity serves as a gravitational anchor against drift.
Alternative Fully editable personality. Rejected: leads to rapid drift; no stable reference point for "who the agent is."

4. Periodic Reflection

Aspect Detail
Problem Raw memory accumulation without consolidation produces incoherent beliefs. Per-interaction processing alone cannot form higher-order personality structure.
Solution Dual-trigger reflection: periodic (every N interactions, default 20) OR event-driven (cumulative shift magnitude > 0.1). Reflection consolidates pending insights into the snapshot, applies belief decay, and synthesizes patterns.
Research Park et al. (2023) ablation: reflection is the most critical component for believable agents. Sleep-time compute studies show gains from idle-time consolidation.
Alternative No reflection; rely on per-interaction updates only. Rejected: Park et al. showed agents accumulate raw memories but cannot form coherent beliefs without reflection.

5. Bootstrap Dampening

Aspect Detail
Problem First impressions dominate personality trajectory. Early interactions anchor the entire trajectory; without dampening, the first user's views disproportionately shape the agent.
Solution First BOOTSTRAP_DAMPENING_UNTIL (default 10) interactions apply 0.5× magnitude to opinion updates. Reduces first-impression dominance.
Research Deffuant bounded confidence model: initial uncertainty and convergence dynamics. Anchoring bias (arXiv:2511.05766): early probability shifts are resistant to mitigation.
Alternative Treat all interactions equally. Rejected: bounded confidence models and anchoring research show first impressions have outsized influence.

6. Dual Store Over Chroma-Only Memory

Aspect Detail
Problem Need both semantic retrieval efficiency and explicit belief/segment/topic provenance.
Solution Path A dual store: Neo4j graph + PostgreSQL/pgvector derivatives and semantic features. Writes are coordinated via DualEpisodeStore; retrieval composes graph traversal and vector search.
Research Prior graph-vs-vector tradeoff findings still apply, but production needs provenance edges and segment lifecycle explicitly represented.
Alternative Chroma-only runtime. Rejected for current architecture because it cannot represent first-class belief provenance and segment graph semantics.

7. Insight Accumulation Over Lossy Rewrites

Aspect Detail
Problem Per-interaction full snapshot rewrites cause the "Broken Telephone" effect. At p=0.95 per rewrite and 40 rewrites over 100 interactions, only 12.9% of initial traits survive.
Solution One-sentence insights accumulated per-interaction; consolidated only during periodic reflection. Snapshot changes only at reflection, not per-interaction.
Research ABBEL (2025): belief bottleneck — forcing information through compressed states outperforms full conversation history. ACL 2025: iterative rewrites cause exponential trait decay.
Alternative Per-interaction full snapshot rewrites. Rejected: belief bottleneck error propagation; exponential trait decay with each rewrite.

8. Bayesian Belief Resistance

Aspect Detail
Problem Without resistance, a single high-ESS interaction could flip a well-established opinion. Opinions should become harder to change as evidence accumulates.
Solution Belief revision uses LLM provenance assessment (update_magnitude, contraction_action, uncertainty) with typed decision contracts, instead of fixed confidence formulas.
Research Sequential Bayesian updating (Oravecz et al., 2016). Bounded confidence models (Hegselmann-Krause, 2002): only sufficiently strong evidence shifts opinions.
Alternative Linear updates regardless of evidence count. Rejected: allows single interactions to overwrite well-established beliefs.

9. Power-Law Belief Decay

Aspect Detail
Problem Unreinforced opinions persist forever at full strength ("zombie opinions"). Human memory and neural networks exhibit forgetting, not permanent retention.
Solution During reflection, staleness handling is LLM-decided (RETAIN / DECAY / FORGET) with typed responses and guarded execution.
Research FadeMem (2026): biologically-inspired power-law forgetting. Ebbinghaus curve: power-law (not exponential) matches human memory. "Ebbinghaus in LLMs" (2025): neural networks exhibit human-like forgetting curves.
Alternative No decay; opinions persist indefinitely. Rejected: produces zombie opinions; contradicts human memory research.

10. Self-Judge Bias Removal

Aspect Detail
Problem Including the agent's response in ESS evaluation creates a feedback loop: agreement inflates quality scores. Self-evaluation bias documented at up to 50 percentage point shifts.
Solution ESS evaluates only the user message; the agent's response is excluded. Third-person framing: "A user sent a message to an AI agent. Rate the strength of arguments in the USER'S message ONLY."
Research SYConBench (EMNLP 2025): third-person perspective prompting reduces sycophancy by up to 63.8%. Self-judgment produces systematic bias toward agreement.
Alternative Include agent response in ESS. Rejected: creates sycophancy feedback loop; agreement would inflate scores.

11. OCEAN Signal Simplification

Aspect Detail
Problem Dynamic OCEAN (Big Five) updates as a personality driver: measurement noise makes the signal unreliable. Self-reported traits don't predict behavior.
Solution Removed dynamic OCEAN updating; retained as static baseline only. Personality tracked via behavioral metrics (disagreement rate, topic engagement, opinion vectors) rather than self-reported traits.
Research PERSIST (2025): even 400B+ models show σ>0.3 noise on personality measurements. Question reordering alone causes large shifts. Personality Illusion (NeurIPS 2025): self-reported traits don't reliably predict behavior; max test-retest r=0.27.
Alternative OCEAN as primary personality driver. Rejected: signal-to-noise ratio makes dynamic updates meaningless; unreliable measurement.

12. JSONL Audit Trail

Aspect Detail
Problem Need provenance tracking for debugging, rollback, and understanding personality evolution. Without logs, failures are opaque.
Solution Every ESS event and reflection event appended to data/ess_log.jsonl. Includes interaction count, score, topics, beliefs, magnitude, dropped beliefs, snapshot size.
Research Standard practice for observability. Enables rollback to sponge_history/sponge_vN.json; debugging of sycophancy or drift; reproducibility.
Alternative No structured audit trail. Rejected: debugging personality failures requires provenance; rollback impossible without version history.

Rejected Approaches

Knowledge Graphs for Beliefs

Why rejected: Graphiti's temporal knowledge graph generated 1.17M tokens per test case, $152 before abort (arXiv:2601.07978). No statistically significant accuracy gain over vector-only at this scale. Complexity not justified for fewer than 1000 interactions. Documented upgrade path if temporal coherence becomes bottleneck.

Fine-Tuning for Personality

Why rejected: Only 1.07% improvement over baselines (arXiv:2409.09510). Requires training data that doesn't exist. Risks catastrophic forgetting. RAG outperforms by ~14×. Fine-tuning changes capabilities, not personality stability.

OCEAN as Personality Driver

Why rejected: PERSIST: σ>0.3 measurement noise even in 400B+ models. Personality Illusion: social desirability bias shifts Big Five by about 1.20 SD in frontier chat models. Self-reported traits don't predict behavior. Measurement unreliable; reliable measurement wouldn't translate to behavioral change.

Real-Time Entity/Fact Extraction

Why rejected: Mem0 achieves 49.3% precision vs 84.6% for long-context baseline. Real-time extraction is noisy, expensive, hallucination-prone ("I was ill last year" → current_status: ill). Batch processing during reflection is cheaper and more accurate.

Per-Interaction Full Snapshot Rewrites

Why rejected: ABBEL belief bottleneck: error propagation through compressed states. Broken Telephone math: exponential trait decay with iterative rewrites. Replaced with insight accumulation + reflection consolidation.

Self-Editing Memory Without Guardrails

Why rejected: MemoryGraft (2025): 47.9% retrieval poisoning from small poisoned record sets. Self-modifying memory is an attack surface. Validation layers (ESS gating, snapshot validation, belief confidence) are mandatory.

Equal Treatment of All Interactions

Why rejected: Bounded confidence models (Hegselmann-Krause 2002, Deffuant): systems that update on every input converge to consensus or oscillate chaotically. Sonality uses quality-gated updates, not unconditional per-turn belief commits.

LoRA Adapters for Personality

Why rejected: LoRA adapters are static once trained — cannot evolve during deployment. Training a LoRA for every opinion change is prohibitively expensive. LoRA personality control degrades general task performance (NeurIPS 2025).

Activation Steering

Why rejected: Controls broad traits (openness, agreeableness) but not specific opinions. Requires access to model internals (hidden states) that API-based models don't expose. No memory, no provenance, no opinion-level granularity.

Pure Long-Context (No External Memory)

Why rejected: Cost scales linearly with history. Attention degrades over long contexts. No structured belief revision — old and new opinions coexist without mechanism to mark supersession. Viable as MVP but inadequate for real personality evolution.


Key Tradeoffs

Decision Option A Option B Chosen Why
Memory update frequency Every message Only strong evidence ESS-gated Gates opinion updates; tracking happens always
Snapshot format Structured JSON only Natural language Both Narrative for personality, structured for math
Update size Small deltas Wholesale rewrite Small deltas (except reflection) Broken Telephone: wholesale rewrite loses info fastest
Memory scope Single-session Cross-session persistent Cross-session The entire point; Zep shows 18.5% improvement with temporal persistence
Gating mechanism Binary (update/don't) Continuous (magnitude) Continuous MACI's information dial: continuous outperforms binary
Reflection trigger Periodic only Event-driven only Dual Fixed interval misses important moments; event-only wastes compute during quiet periods
System Memory Type Update Mechanism Decay Validation Sonality Relationship
Sophia (arXiv:2512.18202) Narrative + KG System 3 meta-layer No Hybrid reward Closest ancestor. Sponge is a simplified System 3 without process-supervised thought search.
Hindsight (arXiv:2512.12818) 4-network graph Retain/Recall/Reflect No World model check Sonality uses two tiers instead of four networks. Simpler but captures core mechanism.
Zep/Graphiti (arXiv:2501.13956) Temporal KG Incremental graph update Yes Temporal consistency Historical benchmark context; current Sonality runtime already uses graph + vector dual-store.
FadeMem (arXiv:2601.18642) Dual-layer SML/LML Adaptive exponential decay Yes Importance scoring Directly inspired Sonality's power-law belief decay. FadeMem achieves 45% storage reduction.
ABBEL (2025) Belief bottleneck RL-trained belief update No Bayesian posterior Conceptually similar to ESS gating; uses RL training (infeasible for API-only).
MACI (arXiv:2510.04488) Dual-dial Information quality + behavior N/A Provable termination ESS maps to MACI's "information dial" — same concept, different framing.
DAM-LLM (arXiv:2510.27418) Bayesian affective memory Bayesian emotional update Implicit Consistency check More theoretically principled. Sonality trades elegance for implementation simplicity.
Memoria (arXiv:2512.12686) Session summaries + KG Weighted knowledge graph N/A KG grounding Validates that compact personality representation (87.1% accuracy with 2k tokens) is sufficient.
Behavioral Resonance (GitHub) Stateless Heartbeat anchors N/A Deep anchors Demonstrates persona continuity without external memory. Sonality's full architecture is still justified for opinion tracking and evolution.
VIGIL (arXiv:2512.07094) EmoBank + core blocks Self-healing runtime N/A Guarded immutability Similar immutable core identity concept; VIGIL adds emotional valence tracking.

Known Weak Spots

Prioritized by severity. Each is a genuine architectural limitation, not a future feature — honest assessment from adversarial testing design.

Critical (System-Breaking if Unaddressed)

# Weak Spot Evidence Sonality's Mitigation Residual Risk
W1 Bland Convergence ACL 2025: LLMs distort own output toward "attractor states." P(survive, 40 rewrites) = 12.9% at p=0.95 per rewrite. Insight accumulation reduces rewrites from ~40 to ~5 per 100 interactions. Snapshot validation catches catastrophic loss. Subtle blandification still accumulates across reflections.
W2 RLHF-Amplified Sycophancy RLHF reward-model analysis (arXiv:2602.01002): RLHF explicitly creates "agreement is good" heuristic. PersistBench: 97% sycophancy with memory in system prompt. Seven anti-sycophancy layers. ESS decoupling breaks the self-judge feedback loop. Residual sycophancy under first-person framing (78.5%, SycEval).
W3 Belief Entrenchment Martingale Score (NeurIPS 2025): ALL models exhibit entrenchment violating Bayesian rationality. Future updates predictable from current beliefs. Belief decay weakens unreinforced opinions. Novelty scoring reduces magnitude for repeated arguments. Early opinions calcify. No Martingale Score check implemented.
W4 ESS Calibration Brittleness ConfTuner (arXiv:2508.18847): verbalized confidence unreliable. PERSIST: question reordering shifts scores by >0.3 on 5-point scales. Structured tool_use schema constrains output. Calibration examples anchor scoring. Retry logic with safe defaults. ESS is the single gatekeeper. Miscalibration cascades to all downstream updates.

High (Significant Quality Degradation)

# Weak Spot Evidence Sonality's Mitigation Residual Risk
W5 Personality Illusion NeurIPS 2025: self-reported traits don't predict behavior (max r=0.27). Persona injection steers self-reports but not behavior. Behavioral metrics (disagreement rate, opinion vectors) track actual behavior, not self-reports. OCEAN removed as personality driver. Snapshot may say "I'm skeptical" while agent behavior is agreeable.
W6 Proactive Interference ICLR 2025: retrieval accuracy decays log-linearly as related information accumulates. Old episodes retrieved instead of current. ESS-weighted reranking prioritizes higher-quality memories. min_relevance=0.3 filters weak matches. At 200+ episodes on a popular topic, contradictory episodes pollute context.
W7 Cosine Similarity Blindness SparseCL (ICML 2025): "I believe X" and "I no longer believe X" both retrieve as similar. 30%+ accuracy improvement with sparse embeddings. Summaries include ESS metadata (score, direction) which disambiguate at the content level. Embedding model cannot distinguish affirmation from negation structurally.
W8 Neural Howlround arXiv:2504.07992: same model at every pipeline stage creates self-reinforcing bias in 67% of conversations. ESS decoupling and third-person framing break the loop at the classification stage. Response generation, insight extraction, and reflection all use the same model.

Medium (Measurable But Bounded)

# Weak Spot Evidence Sonality's Mitigation Residual Risk
W9 Ternary Opinion Direction Argument mining research: ternary classification loses critical nuance. "Partially agrees with caveats" → supports or neutral? Magnitude formula includes novelty and ESS score for granularity. All agreement is treated equally; all opposition is treated equally.
W10 Short-Context Embedding Truncation Compact embedding backbones can degrade on longer text spans, reducing semantic fidelity. ESS summaries are constrained to single sentences. No explicit validation that summaries stay under the configured embedding budget.
W11 No Fact-Checking ESS evaluates argument structure, not truth. Well-structured misinformation will score high. By design — fact-checking is a separate problem. ESS gates on reasoning quality. Agent can form confident opinions based on well-argued falsehoods.

Future Opportunities

These are potential improvements identified through research but not yet implemented:

Sigmoid Persuasion Dynamics

LLMs show non-linear sigmoid persuasion curves with threshold effects. The current linear magnitude scaling could be replaced with a sigmoid where weak evidence has near-zero effect and strong evidence has near-full effect.

Effort: Low. Impact: Medium. When: If opinion oscillation is observed.

Contradiction Detection During Reflection

AGM framework (Alchourrón-Gärdenfoss-Makinson): new beliefs should be checked against existing beliefs for consistency. A same-topic opposite-sign scan during reflection could resolve contradictions.

Effort: Low. Impact: Low (rare with belief resistance). When: If contradictory beliefs are observed.

Importance-Weighted Episode Retrieval

Park et al. (2023): score = α × recency + β × relevance + γ × importance. Current retrieval is cosine similarity with ESS reranking. Adding interaction count as recency would improve retrieval quality.

Effort: Medium. Impact: Medium. When: If retrieved episodes are frequently irrelevant.

Embedding Backend Upgrade

The current compact embedding backend favors short inputs. A long-context embedding backend can improve retrieval quality for longer summaries. Keep concrete model choices in Model Considerations and keep this core document provider-neutral.

Effort: Medium (migration needed). Impact: Medium. When: If retrieval quality is the bottleneck.

Early Stop Reflection Mitigation

IROTE (2025): experience-based reflection can amplify errors. If reflection produces worse output, early-stop or rollback logic could mitigate. Not yet implemented.

Effort: Medium. Impact: Low–Medium. When: If reflection occasionally degrades personality quality.

Martingale Entrenchment Detection

NeurIPS 2025 (arXiv:2512.02914): all LLMs tested exhibit belief entrenchment — future updates become predictable from current beliefs, violating Bayesian rationality. A Martingale Score check during reflection could detect when opinion entrenchment occurs and inject corrective diversity.

Effort: Medium. Impact: Medium. When: If the agent becomes rigid on topics despite evidence.

Graph-Based Episode Storage

Zep/Graphiti (arXiv:2501.13956) and AriGraph (IJCAI 2025) show the value of explicit relational memory for temporal reasoning. Sonality adopts this direction via Path A dual-store rather than Chroma-only runtime.

Effort: High (architecture change). Impact: High at scale. When: If retrieval quality degrades with episode count.

Dual-Window Preference Tracking

PAMU (arXiv:2510.09720) fuses sliding-window averages (captures recent shifts) with long-term EMA (captures stable traits). Maintaining both ema_long (alpha=0.001) and ema_short (sliding window of last 10 interactions) and using 0.7×ema_long + 0.3×ema_short would capture short-term personality dynamics that the current architecture misses.

Effort: Medium. Impact: Medium. When: If the agent fails to reflect recent behavioral changes in responses.


Fundamental Constraints

The Cost-Accuracy-Latency Trilemma. Improving any one dimension degrades the others. More LLM calls improve accuracy (more gating, more validation) but increase cost and latency. Cheaper models reduce cost but decrease ESS calibration quality. Sonality optimizes for accuracy (evidence-gated updates, multi-step pipeline) at the cost of 2–3 LLM calls per interaction (~$0.005–0.015). See Architecture Overview — Cost Analysis for per-call breakdowns.


Related: Architecture Overview — system design and context window budget. Research Background — the 200+ papers behind these decisions. Testing & Evaluation — how each decision is validated.