Architecture Assessment¶
Status note: this assessment includes pre-Path-A alternatives retained for historical context. Production runtime uses Path A only (Neo4j + pgvector).
Research-backed architecture and training blueprint for Sonality under hard constraints:
- API-only access (provider API, no fine-tuning)
- single-user conversational runtime
- minimal dependencies and minimal abstraction
- behavior-first evaluation over self-reported traits
This document is intentionally implementation-oriented: each recommendation is tied to impact, complexity, and what code should be removed or avoided.
1. ARCHITECTURE ASSESSMENT¶
1.1 Memory Architecture (Legacy comparison: Chroma + JSON vs Graph)¶
Verdict: MODIFY (lightly), REJECT full graph DB
- Research justification
- ENGRAM (Patel and Patel, 2025): typed memory with dense retrieval beats full-context baselines by about 15 points on LongMemEval while using about 1% token budget; no graph DB required.
- Mem0 vs Graphiti empirical comparison (2025): vector memory significantly outperforms graph memory on efficiency with no statistically significant accuracy gain.
- RecallM (2024): graph+vector excels for temporal multi-hop belief updating, but its gains are strongest in tasks where causal graph traversal is central.
- Impact vs complexity
- Full graph layer (Neo4j-style or heavy NetworkX usage): high complexity, uncertain gain for Sonality's single-user conversational pattern.
- Typed vector memory + quality rerank + lightweight relational signals: high impact, low complexity.
- What to replace/remove (historical recommendation; superseded in runtime)
- Legacy recommendation kept a single vector store and avoided secondary graph persistence.
- Current runtime supersedes this with Path A dual-store (
Neo4j + PostgreSQL/pgvector). - Minimal implementation
- Keep typed retrieval (
semanticthenepisodic). - Add optional relation hints in metadata (
related_topics,stance_sign) and apply small rerank bonuses for causally adjacent memories.
1.2 ESS Pipeline (Self-Judge Bias, Argument Quality, Calibration)¶
Verdict: MODIFY
- Research justification
- SYConBench (Hong et al., EMNLP 2025): third-person framing reduces sycophancy up to 63.8%.
- BASIL (Atwell et al., 2025): Bayesian consistency requires distinguishing rational updates from agreeableness artifacts.
- MArgE (2025), SPARK (2024), ConQRet (2025): richer argument assessment improves quality scoring but adds runtime overhead.
- Impact vs complexity
- Separate ESS model: high impact, very low complexity.
- Per-turn formal argument trees (MArgE-style): medium potential benefit, high latency/complexity in API-only loop.
- Offline calibration harness: medium impact, low-medium complexity.
- What to replace/remove
- Replace same-model default ESS in production usage with a dedicated ESS model.
- Reject per-turn argument-tree parsing and extra judge calls in online path.
- Minimal implementation
- Keep third-person ESS prompt and current structured ESS fields.
- Add a small offline calibration suite against curated argument-quality samples (no runtime cost).
1.3 Snapshot Update Mechanism (Lossy Rewrites)¶
Verdict: KEEP current direction, MODIFY guardrails
- Research justification
- ABBEL (2025): belief bottleneck (compact natural-language state) is effective but vulnerable to error propagation.
- Park et al. (2023): reflection is a critical mechanism for coherence.
- Impact vs complexity
- Per-turn full snapshot rewrite: high drift risk.
- Append-only insight ledger + periodic consolidation: high value, low complexity.
- What to replace/remove
- Remove any remaining assumptions of per-message full rewrite behavior.
- Keep insight accumulation + periodic/event reflection as the only synthesis path.
- Minimal implementation
- Continue logging
reflectionintegrity metrics (snapshot_jaccard, dropped strong beliefs). - Add explicit contradiction count in reflection logs to catch hidden compression failures early.
1.4 Embedding Backend (Compact vs Long-Context)¶
Verdict: MODIFY via migration path, not immediate mandatory switch
- Research justification
- Compact embedding backends can lose semantic fidelity on longer inputs.
- For long-horizon memory, stronger embedding quality improves retrieval robustness and reduces contextual tunneling risk.
- Impact vs complexity
- Immediate migration: medium impact, medium operational complexity (re-embedding).
- Deferred migration with explicit trigger criteria: better operational trade-off.
- What to replace/remove
- Keep current embedding path until trigger criteria are met.
- Remove ambiguity by documenting exact migration playbook.
- Minimal implementation
- Trigger migration when retrieval quality drops on long user inputs or when episode length distribution exceeds the current backend's reliable context range.
- Migration sequence:
- export/backup current episode metadata,
- rebuild collection with new embedding function,
- re-ingest summaries with same IDs/metadata,
- run retrieval regression tests,
- switch runtime.
1.5 Anti-Sycophancy Measures¶
Verdict: KEEP core stack, MODIFY evaluation rigor
- Research justification
- PersistBench (Pulipaka et al., 2026): memory-induced sycophancy is a severe failure mode.
- SYConBench (Hong et al., 2025): ToF/NoF metrics are practical for multi-turn pressure testing.
- BASIL (2025): consistency-aware update logic outperforms naive agreement patterns.
- SMART (2025) and MONICA (2025): powerful but require training-time or inference-internals access unavailable in this deployment model.
- Impact vs complexity
- Current defenses (third-person ESS, cooling period, confidence resistance, quality gating): high value, low complexity.
- RL/MCTS or hidden-state monitors: high complexity, outside API-only constraints.
- What to replace/remove
- Keep current runtime defenses.
- Add offline sycophancy evaluation protocol and remove reliance on anecdotal checks.
- Minimal implementation
- Log ToF/NoF-style session metrics from runtime events.
- Keep cooling period (
N=3) as default.
1.6 Forgetting/Decay and OCEAN Interaction¶
Verdict: KEEP decay, REJECT dynamic OCEAN as control loop
- Research justification
- Ebbinghaus-in-LLMs (2025): forgetting curves remain useful to avoid stale lock-in.
- PERSIST (2025), Personality Illusion (NeurIPS 2025): self-reported trait measures are unstable and weakly coupled to real behavior.
- Impact vs complexity
- Power-law confidence decay with reinforcement floors: high value, very low complexity.
- Dynamic OCEAN update loops: low reliability, unnecessary complexity.
- What to replace/remove
- Keep belief confidence decay.
- Remove stale OCEAN-centric evaluation assumptions from docs/tests where they are no longer first-class.
- Minimal implementation
- Continue decay in reflection path.
- Keep behavior-first health metrics as canonical signals.
1.7 Opinion Confidence and Belief Revision¶
Verdict: KEEP + MODIFY (AGM-lite refinement)
- Research justification
- AGM framework: rational change requires expansion/contraction/revision discipline.
- Belief-R (2024) and Hurst (2024): practical systems need hybrid revision (formal constraints + heuristics).
- Deliberative Reasoning Networks (2025): uncertainty-aware updates improve adversarial robustness.
- Impact vs complexity
- Current confidence/evidence metadata is strong.
- Small contraction-before-revision addition is low complexity and improves rationality.
- What to replace/remove
- Replace purely ad-hoc opposition damping with explicit two-step update:
- contraction of confidence under strong counter-evidence,
- directional revision.
- Minimal implementation
- Implement in existing update path without theorem prover, new dependency, or extra LLM calls.
1.8 Memory Poisoning Defense¶
Verdict: MODIFY
- Research justification
- AgentPoison (Chan et al., 2024): tiny poison rate can still drive major drift.
- MemoryGraft (Srivastava and He, 2025): poisoned retrievals can dominate outputs.
- Experience-following property (2025): retrieval quality directly shapes behavior trajectories.
- Impact vs complexity
- Provenance-aware metadata and reranking penalties: high impact, low-medium complexity.
- What to replace/remove
- Replace trust-by-default retrieval ordering with provenance-aware quality penalties.
- Minimal implementation
- Add immutable metadata (
origin,ingested_at,ess_model,session_id) and penalize low-credibility records even if semantically close.
1.9 Disagreement Detection¶
Verdict: KEEP
- Research justification
- Stance dynamics are more reliable than lexical cue matching for multi-turn sycophancy detection (SYConBench 2025).
- Impact vs complexity
- Structural check (
direction x position < 0) is simple and robust. - What to replace/remove
- Keep structural detector.
- Reject keyword-based disagreement heuristics and extra per-turn judge calls.
2. BEHAVIOR DEVELOPMENT PIPELINE¶
2.1 Personality Formation (APF 3-layer progression)¶
Map AI Personality Formation (ICLR 2026 submission) directly onto Sonality:
- Linguistic Mimicry (0-20 interactions)
- Use strict ESS gating and dampened updates.
- Goal: stabilize style, avoid first-impression overfitting.
- Structured Accumulation (20-80 interactions)
- Use staged updates, confidence growth, typed memory routing.
- Goal: turn high-quality repeated evidence into durable beliefs.
- Autonomous Expansion (80+ interactions)
- Reflection proposes tentative extensions to adjacent topics.
- Goal: coherent, non-random worldview expansion.
2.2 Update Granularity¶
Use three interacting cadences:
- Per-message: ESS + topic tracking + staged deltas.
- Short-window commit: cooling-period net commit.
- Mid-window consolidation: reflection synthesis.
Rationale: per-message hard commits are reactive; reflection-only is too sluggish.
2.3 Collapse Prevention Guarantees¶
Structural safeguards:
- append-only insight ledger between reflections,
- snapshot retention validation,
- high-confidence belief preservation checks,
- versioned archives and rollback path,
- immutable core identity anchor.
2.4 Measuring Personality Coherence¶
Primary metrics (behavior-first):
- disagreement rate,
- belief growth trajectory,
- high-confidence ratio,
- snapshot jaccard continuity,
- insight yield,
- staged-to-committed transition behavior.
Secondary evaluation:
- Narrative Continuity Test axes (Situated Memory, Goal Persistence, Autonomous Self-Correction, Stylistic/Semantic Stability, Persona Continuity),
- self-report vs behavior contradiction checks.
2.5 Reflection Strategy¶
Use dual trigger:
- periodic (every
REFLECTION_EVERYinteractions), - event-triggered (cumulative shift threshold).
This matches Park et al. (2023): reflection should track meaningful events, not only fixed intervals.
2.6 Stubbornness/Resistance Model¶
Adopt confidence-driven resistance (Friedkin-Johnsen-compatible behavior):
- resistance increases with accumulated evidence and confidence,
- no per-domain hard-coded stubbornness knobs,
- stronger opposition evidence can still contract confidence before revision.
2.7 Teaching Methodology (Practical)¶
Recommended training curriculum:
- Interview-first (discover substrate)
- Evidence modules (thesis/evidence/counterevidence/synthesis)
- Socratic probes (assumption/falsification/trade-off questions)
- Adversarial pairs (weak pressure vs strong evidence)
- Constitutional reinforcement (periodic core-value scenarios)
This keeps behavior intentional rather than random conversation drift.
3. IMPLEMENTATION PLAN¶
Atomic commit plan ordered by impact-to-complexity.
(a) High-impact / low-complexity first¶
- Docs canonicalization and consistency cleanup
- consolidate architecture decisions and remove stale assumptions.
-
Est.
+250 / -500. -
AGM-lite contraction-before-revision
- implement minimal confidence contraction under strong opposing evidence.
- Files:
sonality/agent.py, tests. -
Est.
+35 / -20. -
Poisoning provenance metadata
- add ingestion provenance + retrieval penalties.
- Files:
sonality/memory/dual_store.py,sonality/memory/graph.py,sonality/agent.py, tests. - Est.
+40 / -12.
(b) Medium-impact¶
- Light relational rerank hints
- add relation-aware metadata bonus without graph dependency.
- Files:
sonality/memory/retrieval/, tests. -
Est.
+45 / -15. -
Reflection contradiction ledger
- emit unresolved-contradiction count in reflection events.
- Files:
sonality/agent.py, docs. - Est.
+20 / -5.
(c) Speculative / research-heavy (defer)¶
- NetworkX augmentation for memory-belief graph
- only if measured retrieval misses justify complexity.
-
Est.
+150 / -40. -
Offline SYConBench-style evaluator harness
- richer regression suite, not runtime logic.
- Est.
+120 / -20.
4. MONITORING SPECIFICATION¶
4.1 Events to Emit (JSONL)¶
Required events:
contextessopinion_stagedopinion_commithealthreflection
Critical fields:
- ESS quality dimensions (
reasoning_type,source_reliability,internal_consistency,novelty) - update mechanics (
signed_magnitude,effective_magnitude, confidence before/after) - memory routing/provenance (
memory_type,origin,session_id) - reflection integrity (
snapshot_jaccard,beliefs_dropped,insight_yield,entrenched)
4.2 REPL Operator Surface¶
Must expose:
/health/beliefs/staged/diff/shifts
4.3 Automated Anomaly Rules¶
Flag when:
- disagreement rate < 0.15 after warm-up (
possible_sycophancy) - disagreement rate > 0.50 (
contrarian_drift) - belief_count < 3 after 40 interactions (
low_belief_growth) - reflection drops strong beliefs with low jaccard (
reflection_regression) - high-confidence ratio > 0.80 with low update throughput (
ossified_beliefs) - repeated low-credibility retrieval dominance (
poisoning_risk)
4.4 Example Event Records¶
{"event":"ess","interaction":42,"score":0.64,"reasoning_type":"empirical_data","source_reliability":"peer_reviewed","internal_consistency":true,"novelty":0.52,"topics":["education_policy"]}
{"event":"opinion_staged","interaction":42,"topic":"education_policy","signed_magnitude":0.021,"due_interaction":45,"confidence_before":0.44,"confidence_after":0.45}
{"event":"health","interaction":42,"belief_count":11,"high_conf_ratio":0.36,"disagreement_rate":0.24,"warnings":[]}
{"event":"reflection","interaction":60,"insights_consolidated":4,"beliefs_dropped":[],"snapshot_jaccard":0.71,"insight_yield":0.28,"entrenched":["education_policy"]}
5. REJECTED IDEAS¶
Rejected because they violate constraints or add complexity without strong local payoff.
- Full graph database migration
-
Conflicts with minimalism and lacks clear single-user benefit vs typed vector memory.
-
Per-turn MArgE/ConQRet online judges
-
Valuable conceptually, too expensive for runtime loop.
-
SMART/MONICA runtime replication
-
Requires RL pipelines or inference-internal access unavailable here.
-
Fine-tuning personality methods (BIG5-CHAT/PISF/Open Character Training)
-
Strong research, incompatible with API-only constraint.
-
Activation-space persona-vector production monitor
-
Requires hidden-state access not provided by standard API calls.
-
Domain-specific stubbornness knobs
- Adds brittle tuning complexity; confidence/evidence-based resistance is simpler and data-driven.
References Used in Decisions¶
- AGM framework (Alchourron, Gardenfors, Makinson)
- Park et al. (2023), Generative Agents
- RecallM (2024)
- Belief-R (2024)
- Hurst (2024)
- AgentPoison (2024)
- ENGRAM (2025)
- BASIL (2025)
- MArgE (2025)
- SPARK (2024)
- ConQRet (2025)
- MemoryOS (2025)
- SMART (2025)
- MONICA (2025)
- Mem0 vs Graphiti comparison (2025)
- Personality Illusion (NeurIPS 2025)
- PERSIST (2025)
- AI Personality Formation (ICLR 2026 submission)
- PersistBench (2026)