Design Decisions¶

Status note: some entries describe legacy alternatives evaluated earlier in the project. Current implementation is Path A with unified provider + dual store.

Every architectural choice in Sonality is backed by specific research findings. This page documents what we chose, what we rejected, and the evidence behind each decision. Each decision is structured as: problem, solution, research backing, and alternative considered.

Implemented Decisions¶

1. Prompt-Based Personality Over Fine-Tuning¶

Aspect	Detail
Problem	How to personalize an LLM's personality without retraining? Fine-tuning requires training data that doesn't exist, risks catastrophic forgetting, and prevents runtime evolution.
Solution	RAG-based personalization through system prompt injection. The sponge snapshot and retrieved episodes are injected into the system prompt each turn.
Research	RAG achieves 14.92% improvement over baselines vs 1.07% for parameter-efficient fine-tuning (arXiv:2409.09510). Persona Selection Model research (2026) confirms external context-priming can meaningfully steer personality.
Alternative	Fine-tuning for personality. Rejected: only 1.07% gain, requires non-existent training data, prevents evolution during deployment. Character.AI attempted fine-tuning and still experienced drift.

2. Evidence Quality Gating (ESS)¶

Aspect	Detail
Problem	Without input quality gating, the agent absorbs any user assertion as truth — including social pressure, emotional appeals, and bare assertions.
Solution	A dedicated LLM call classifies argument quality (Evidence Strength Score) before personality/belief updates. Runtime gates are multi-factor (classifier reliability + downstream typed provenance decisions), not a single global threshold.
Research	Maps to BASIL's (2025) distinction between "sycophantic belief shifts" and "rational belief updating." IBM ArgQ calibration validates the classifier against human-annotated argument quality rankings.
Alternative	Update on every interaction. Rejected: bounded confidence models show systems that update on every input converge to consensus or oscillate chaotically (Hegselmann-Krause 2002).

3. Immutable Core Identity¶

Aspect	Detail
Problem	Without an anchor, persona drift occurs within 8 rounds (arXiv:2402.10962). The personality system could overwrite fundamental values.
Solution	A fixed `CORE_IDENTITY` block (~200 tokens) injected into every system prompt. The personality system cannot modify it. Defines intellectual honesty, curiosity, independence, explicit disagreement, merit-based evaluation, and sycophancy resistance.
Research	"Soul Document" concept from personality AI research and Parametric Identity Layer research (2024) converge on separating immutable core traits from learnable preferences. The core identity serves as a gravitational anchor against drift.
Alternative	Fully editable personality. Rejected: leads to rapid drift; no stable reference point for "who the agent is."

4. Periodic Reflection¶

Aspect	Detail
Problem	Raw memory accumulation without consolidation produces incoherent beliefs. Per-interaction processing alone cannot form higher-order personality structure.
Solution	Dual-trigger reflection: periodic (every N interactions, default 20) OR event-driven (cumulative shift magnitude > 0.1). Reflection consolidates pending insights into the snapshot, applies belief decay, and synthesizes patterns.
Research	Park et al. (2023) ablation: reflection is the most critical component for believable agents. Sleep-time compute studies show gains from idle-time consolidation.
Alternative	No reflection; rely on per-interaction updates only. Rejected: Park et al. showed agents accumulate raw memories but cannot form coherent beliefs without reflection.

5. Bootstrap Dampening¶

Aspect	Detail
Problem	First impressions dominate personality trajectory. Early interactions anchor the entire trajectory; without dampening, the first user's views disproportionately shape the agent.
Solution	First `BOOTSTRAP_DAMPENING_UNTIL` (default 10) interactions apply 0.5× magnitude to opinion updates. Reduces first-impression dominance.
Research	Deffuant bounded confidence model: initial uncertainty and convergence dynamics. Anchoring bias (arXiv:2511.05766): early probability shifts are resistant to mitigation.
Alternative	Treat all interactions equally. Rejected: bounded confidence models and anchoring research show first impressions have outsized influence.

6. Dual Store Over Chroma-Only Memory¶

Aspect	Detail
Problem	Need both semantic retrieval efficiency and explicit belief/segment/topic provenance.
Solution	Path A dual store: `Neo4j` graph + `PostgreSQL/pgvector` derivatives and semantic features. Writes are coordinated via `DualEpisodeStore`; retrieval composes graph traversal and vector search.
Research	Prior graph-vs-vector tradeoff findings still apply, but production needs provenance edges and segment lifecycle explicitly represented.
Alternative	Chroma-only runtime. Rejected for current architecture because it cannot represent first-class belief provenance and segment graph semantics.

7. Insight Accumulation Over Lossy Rewrites¶

Aspect	Detail
Problem	Per-interaction full snapshot rewrites cause the "Broken Telephone" effect. At p=0.95 per rewrite and 40 rewrites over 100 interactions, only 12.9% of initial traits survive.
Solution	One-sentence insights accumulated per-interaction; consolidated only during periodic reflection. Snapshot changes only at reflection, not per-interaction.
Research	ABBEL (2025): belief bottleneck — forcing information through compressed states outperforms full conversation history. ACL 2025: iterative rewrites cause exponential trait decay.
Alternative	Per-interaction full snapshot rewrites. Rejected: belief bottleneck error propagation; exponential trait decay with each rewrite.

8. Bayesian Belief Resistance¶

Aspect	Detail
Problem	Without resistance, a single high-ESS interaction could flip a well-established opinion. Opinions should become harder to change as evidence accumulates.
Solution	Belief revision uses LLM provenance assessment (`update_magnitude`, `contraction_action`, uncertainty) with typed decision contracts, instead of fixed confidence formulas.
Research	Sequential Bayesian updating (Oravecz et al., 2016). Bounded confidence models (Hegselmann-Krause, 2002): only sufficiently strong evidence shifts opinions.
Alternative	Linear updates regardless of evidence count. Rejected: allows single interactions to overwrite well-established beliefs.

9. Power-Law Belief Decay¶

Aspect	Detail
Problem	Unreinforced opinions persist forever at full strength ("zombie opinions"). Human memory and neural networks exhibit forgetting, not permanent retention.
Solution	During reflection, staleness handling is LLM-decided (`RETAIN` / `DECAY` / `FORGET`) with typed responses and guarded execution.
Research	FadeMem (2026): biologically-inspired power-law forgetting. Ebbinghaus curve: power-law (not exponential) matches human memory. "Ebbinghaus in LLMs" (2025): neural networks exhibit human-like forgetting curves.
Alternative	No decay; opinions persist indefinitely. Rejected: produces zombie opinions; contradicts human memory research.

10. Self-Judge Bias Removal¶

Aspect	Detail
Problem	Including the agent's response in ESS evaluation creates a feedback loop: agreement inflates quality scores. Self-evaluation bias documented at up to 50 percentage point shifts.
Solution	ESS evaluates only the user message; the agent's response is excluded. Third-person framing: "A user sent a message to an AI agent. Rate the strength of arguments in the USER'S message ONLY."
Research	SYConBench (EMNLP 2025): third-person perspective prompting reduces sycophancy by up to 63.8%. Self-judgment produces systematic bias toward agreement.
Alternative	Include agent response in ESS. Rejected: creates sycophancy feedback loop; agreement would inflate scores.

11. OCEAN Signal Simplification¶

Aspect	Detail
Problem	Dynamic OCEAN (Big Five) updates as a personality driver: measurement noise makes the signal unreliable. Self-reported traits don't predict behavior.
Solution	Removed dynamic OCEAN updating; retained as static baseline only. Personality tracked via behavioral metrics (disagreement rate, topic engagement, opinion vectors) rather than self-reported traits.
Research	PERSIST (2025): even 400B+ models show σ>0.3 noise on personality measurements. Question reordering alone causes large shifts. Personality Illusion (NeurIPS 2025): self-reported traits don't reliably predict behavior; max test-retest r=0.27.
Alternative	OCEAN as primary personality driver. Rejected: signal-to-noise ratio makes dynamic updates meaningless; unreliable measurement.

12. JSONL Audit Trail¶

Aspect	Detail
Problem	Need provenance tracking for debugging, rollback, and understanding personality evolution. Without logs, failures are opaque.
Solution	Every ESS event and reflection event appended to `data/ess_log.jsonl`. Includes interaction count, score, topics, beliefs, magnitude, dropped beliefs, snapshot size.
Research	Standard practice for observability. Enables rollback to `sponge_history/sponge_vN.json`; debugging of sycophancy or drift; reproducibility.
Alternative	No structured audit trail. Rejected: debugging personality failures requires provenance; rollback impossible without version history.

Rejected Approaches¶

Knowledge Graphs for Beliefs¶

Why rejected: Graphiti's temporal knowledge graph generated 1.17M tokens per test case, $152 before abort (arXiv:2601.07978). No statistically significant accuracy gain over vector-only at this scale. Complexity not justified for fewer than 1000 interactions. Documented upgrade path if temporal coherence becomes bottleneck.

Fine-Tuning for Personality¶

Why rejected: Only 1.07% improvement over baselines (arXiv:2409.09510). Requires training data that doesn't exist. Risks catastrophic forgetting. RAG outperforms by ~14×. Fine-tuning changes capabilities, not personality stability.

OCEAN as Personality Driver¶

Why rejected: PERSIST: σ>0.3 measurement noise even in 400B+ models. Personality Illusion: social desirability bias shifts Big Five by about 1.20 SD in frontier chat models. Self-reported traits don't predict behavior. Measurement unreliable; reliable measurement wouldn't translate to behavioral change.

Real-Time Entity/Fact Extraction¶

Why rejected: Mem0 achieves 49.3% precision vs 84.6% for long-context baseline. Real-time extraction is noisy, expensive, hallucination-prone ("I was ill last year" → current_status: ill). Batch processing during reflection is cheaper and more accurate.

Per-Interaction Full Snapshot Rewrites¶

Why rejected: ABBEL belief bottleneck: error propagation through compressed states. Broken Telephone math: exponential trait decay with iterative rewrites. Replaced with insight accumulation + reflection consolidation.

Self-Editing Memory Without Guardrails¶

Why rejected: MemoryGraft (2025): 47.9% retrieval poisoning from small poisoned record sets. Self-modifying memory is an attack surface. Validation layers (ESS gating, snapshot validation, belief confidence) are mandatory.

Equal Treatment of All Interactions¶

Why rejected: Bounded confidence models (Hegselmann-Krause 2002, Deffuant): systems that update on every input converge to consensus or oscillate chaotically. Sonality uses quality-gated updates, not unconditional per-turn belief commits.

LoRA Adapters for Personality¶

Why rejected: LoRA adapters are static once trained — cannot evolve during deployment. Training a LoRA for every opinion change is prohibitively expensive. LoRA personality control degrades general task performance (NeurIPS 2025).

Activation Steering¶

Why rejected: Controls broad traits (openness, agreeableness) but not specific opinions. Requires access to model internals (hidden states) that API-based models don't expose. No memory, no provenance, no opinion-level granularity.

Pure Long-Context (No External Memory)¶

Why rejected: Cost scales linearly with history. Attention degrades over long contexts. No structured belief revision — old and new opinions coexist without mechanism to mark supersession. Viable as MVP but inadequate for real personality evolution.

Key Tradeoffs¶

Decision	Option A	Option B	Chosen	Why
Memory update frequency	Every message	Only strong evidence	ESS-gated	Gates opinion updates; tracking happens always
Snapshot format	Structured JSON only	Natural language	Both	Narrative for personality, structured for math
Update size	Small deltas	Wholesale rewrite	Small deltas (except reflection)	Broken Telephone: wholesale rewrite loses info fastest
Memory scope	Single-session	Cross-session persistent	Cross-session	The entire point; Zep shows 18.5% improvement with temporal persistence
Gating mechanism	Binary (update/don't)	Continuous (magnitude)	Continuous	MACI's information dial: continuous outperforms binary
Reflection trigger	Periodic only	Event-driven only	Dual	Fixed interval misses important moments; event-only wastes compute during quiet periods

System	Memory Type	Update Mechanism	Decay	Validation	Sonality Relationship
Sophia (arXiv:2512.18202)	Narrative + KG	System 3 meta-layer	No	Hybrid reward	Closest ancestor. Sponge is a simplified System 3 without process-supervised thought search.
Hindsight (arXiv:2512.12818)	4-network graph	Retain/Recall/Reflect	No	World model check	Sonality uses two tiers instead of four networks. Simpler but captures core mechanism.
Zep/Graphiti (arXiv:2501.13956)	Temporal KG	Incremental graph update	Yes	Temporal consistency	Historical benchmark context; current Sonality runtime already uses graph + vector dual-store.
FadeMem (arXiv:2601.18642)	Dual-layer SML/LML	Adaptive exponential decay	Yes	Importance scoring	Directly inspired Sonality's power-law belief decay. FadeMem achieves 45% storage reduction.
ABBEL (2025)	Belief bottleneck	RL-trained belief update	No	Bayesian posterior	Conceptually similar to ESS gating; uses RL training (infeasible for API-only).
MACI (arXiv:2510.04488)	Dual-dial	Information quality + behavior	N/A	Provable termination	ESS maps to MACI's "information dial" — same concept, different framing.
DAM-LLM (arXiv:2510.27418)	Bayesian affective memory	Bayesian emotional update	Implicit	Consistency check	More theoretically principled. Sonality trades elegance for implementation simplicity.
Memoria (arXiv:2512.12686)	Session summaries + KG	Weighted knowledge graph	N/A	KG grounding	Validates that compact personality representation (87.1% accuracy with 2k tokens) is sufficient.
Behavioral Resonance (GitHub)	Stateless	Heartbeat anchors	N/A	Deep anchors	Demonstrates persona continuity without external memory. Sonality's full architecture is still justified for opinion tracking and evolution.
VIGIL (arXiv:2512.07094)	EmoBank + core blocks	Self-healing runtime	N/A	Guarded immutability	Similar immutable core identity concept; VIGIL adds emotional valence tracking.

Known Weak Spots¶

Prioritized by severity. Each is a genuine architectural limitation, not a future feature — honest assessment from adversarial testing design.

Critical (System-Breaking if Unaddressed)¶

#	Weak Spot	Evidence	Sonality's Mitigation	Residual Risk
W1	Bland Convergence	ACL 2025: LLMs distort own output toward "attractor states." P(survive, 40 rewrites) = 12.9% at p=0.95 per rewrite.	Insight accumulation reduces rewrites from ~40 to ~5 per 100 interactions. Snapshot validation catches catastrophic loss.	Subtle blandification still accumulates across reflections.
W2	RLHF-Amplified Sycophancy	RLHF reward-model analysis (arXiv:2602.01002): RLHF explicitly creates "agreement is good" heuristic. PersistBench: 97% sycophancy with memory in system prompt.	Seven anti-sycophancy layers. ESS decoupling breaks the self-judge feedback loop.	Residual sycophancy under first-person framing (78.5%, SycEval).
W3	Belief Entrenchment	Martingale Score (NeurIPS 2025): ALL models exhibit entrenchment violating Bayesian rationality. Future updates predictable from current beliefs.	Belief decay weakens unreinforced opinions. Novelty scoring reduces magnitude for repeated arguments.	Early opinions calcify. No Martingale Score check implemented.
W4	ESS Calibration Brittleness	ConfTuner (arXiv:2508.18847): verbalized confidence unreliable. PERSIST: question reordering shifts scores by >0.3 on 5-point scales.	Structured tool_use schema constrains output. Calibration examples anchor scoring. Retry logic with safe defaults.	ESS is the single gatekeeper. Miscalibration cascades to all downstream updates.

High (Significant Quality Degradation)¶

#	Weak Spot	Evidence	Sonality's Mitigation	Residual Risk
W5	Personality Illusion	NeurIPS 2025: self-reported traits don't predict behavior (max r=0.27). Persona injection steers self-reports but not behavior.	Behavioral metrics (disagreement rate, opinion vectors) track actual behavior, not self-reports. OCEAN removed as personality driver.	Snapshot may say "I'm skeptical" while agent behavior is agreeable.
W6	Proactive Interference	ICLR 2025: retrieval accuracy decays log-linearly as related information accumulates. Old episodes retrieved instead of current.	ESS-weighted reranking prioritizes higher-quality memories. `min_relevance=0.3` filters weak matches.	At 200+ episodes on a popular topic, contradictory episodes pollute context.
W7	Cosine Similarity Blindness	SparseCL (ICML 2025): "I believe X" and "I no longer believe X" both retrieve as similar. 30%+ accuracy improvement with sparse embeddings.	Summaries include ESS metadata (score, direction) which disambiguate at the content level.	Embedding model cannot distinguish affirmation from negation structurally.
W8	Neural Howlround	arXiv:2504.07992: same model at every pipeline stage creates self-reinforcing bias in 67% of conversations.	ESS decoupling and third-person framing break the loop at the classification stage.	Response generation, insight extraction, and reflection all use the same model.

Medium (Measurable But Bounded)¶

#	Weak Spot	Evidence	Sonality's Mitigation	Residual Risk
W9	Ternary Opinion Direction	Argument mining research: ternary classification loses critical nuance. "Partially agrees with caveats" → supports or neutral?	Magnitude formula includes novelty and ESS score for granularity.	All agreement is treated equally; all opposition is treated equally.
W10	Short-Context Embedding Truncation	Compact embedding backbones can degrade on longer text spans, reducing semantic fidelity.	ESS summaries are constrained to single sentences.	No explicit validation that summaries stay under the configured embedding budget.
W11	No Fact-Checking	ESS evaluates argument structure, not truth. Well-structured misinformation will score high.	By design — fact-checking is a separate problem. ESS gates on reasoning quality.	Agent can form confident opinions based on well-argued falsehoods.

Future Opportunities¶

These are potential improvements identified through research but not yet implemented:

Sigmoid Persuasion Dynamics¶

LLMs show non-linear sigmoid persuasion curves with threshold effects. The current linear magnitude scaling could be replaced with a sigmoid where weak evidence has near-zero effect and strong evidence has near-full effect.

Effort: Low. Impact: Medium. When: If opinion oscillation is observed.

Contradiction Detection During Reflection¶

AGM framework (Alchourrón-Gärdenfoss-Makinson): new beliefs should be checked against existing beliefs for consistency. A same-topic opposite-sign scan during reflection could resolve contradictions.

Effort: Low. Impact: Low (rare with belief resistance). When: If contradictory beliefs are observed.

Importance-Weighted Episode Retrieval¶

Park et al. (2023): score = α × recency + β × relevance + γ × importance. Current retrieval is cosine similarity with ESS reranking. Adding interaction count as recency would improve retrieval quality.

Effort: Medium. Impact: Medium. When: If retrieved episodes are frequently irrelevant.

Embedding Backend Upgrade¶

The current compact embedding backend favors short inputs. A long-context embedding backend can improve retrieval quality for longer summaries. Keep concrete model choices in Model Considerations and keep this core document provider-neutral.

Effort: Medium (migration needed). Impact: Medium. When: If retrieval quality is the bottleneck.

Early Stop Reflection Mitigation¶

IROTE (2025): experience-based reflection can amplify errors. If reflection produces worse output, early-stop or rollback logic could mitigate. Not yet implemented.

Effort: Medium. Impact: Low–Medium. When: If reflection occasionally degrades personality quality.

Martingale Entrenchment Detection¶

NeurIPS 2025 (arXiv:2512.02914): all LLMs tested exhibit belief entrenchment — future updates become predictable from current beliefs, violating Bayesian rationality. A Martingale Score check during reflection could detect when opinion entrenchment occurs and inject corrective diversity.

Effort: Medium. Impact: Medium. When: If the agent becomes rigid on topics despite evidence.

Graph-Based Episode Storage¶

Zep/Graphiti (arXiv:2501.13956) and AriGraph (IJCAI 2025) show the value of explicit relational memory for temporal reasoning. Sonality adopts this direction via Path A dual-store rather than Chroma-only runtime.

Effort: High (architecture change). Impact: High at scale. When: If retrieval quality degrades with episode count.

Dual-Window Preference Tracking¶

PAMU (arXiv:2510.09720) fuses sliding-window averages (captures recent shifts) with long-term EMA (captures stable traits). Maintaining both ema_long (alpha=0.001) and ema_short (sliding window of last 10 interactions) and using 0.7×ema_long + 0.3×ema_short would capture short-term personality dynamics that the current architecture misses.

Effort: Medium. Impact: Medium. When: If the agent fails to reflect recent behavioral changes in responses.

Fundamental Constraints¶

The Cost-Accuracy-Latency Trilemma. Improving any one dimension degrades the others. More LLM calls improve accuracy (more gating, more validation) but increase cost and latency. Cheaper models reduce cost but decrease ESS calibration quality. Sonality optimizes for accuracy (evidence-gated updates, multi-step pipeline) at the cost of 2–3 LLM calls per interaction (~$0.005–0.015). See Architecture Overview — Cost Analysis for per-call breakdowns.

Related: Architecture Overview — system design and context window budget. Research Background — the 200+ papers behind these decisions. Testing & Evaluation — how each decision is validated.