Updates - localmelo docs

track 2 memory design sleep-ready

Track 2 memory-system design evolution

2026-05-03 - design notes .melo/track2_plan.md

Track 2 now has a clearer memory-session shape. The online path splits remembered candidates into a fast working-memory branch and a Hippo branch that owns durable or specialized placement.

The agent keeps the v1 write decision lightweight: it only proposes that a candidate is worth remembering.
working stores short-lived active task/session context with LRU-style refresh; the design direction separates fresh and rehearsed memories.
Long-term persistence waits until the related working memory is rehearsed or another strong write signal exists.
Hippo routes durable writes with five explicit actions: NEW, UPDATE, RELATED, CONFLICT, and IGNORE.
long and profile share one physical long-term store in v1, distinguished by metadata kinds such as semantic, episodic, profile, and reflection.
tools remains the authoritative registry and grows procedural skill/template memory without bypassing checker or executor policy.
PersonalizedMemory is a filtered sleep-sample staging area for Track 4, not a dump of raw history.
history stays append-only or near-append-only for debugging, replay, auditability, and provenance lookup outside normal recall.

The v1 foundation now has persistent-memory docs, multi-session SQLite stress coverage, real-backend long-term retrieval evidence, and a frozen selected-sample schema for PersonalizedMemory. Later Track 2 work moves into working-memory stages, Hippo write routing, long/profile semantics, procedural memory, and evaluation.

track 2 v1 backend smoke qwen3 embedding

Real-backend memory smoke archived

2026-05-10 - issue #4

OMLX, Ollama, and MLC-LLM were tested against the same Qwen3 0.6B embedding family for persistent long-term retrieval. The durable archive lives under tests_result/backend-smoke/2026-05-10-qwen3-embedding-0.6b/; transient runner output remains separate.

The times below are end-to-end scenario wall-clock times. They include chat generation, embedding calls, persistence/retrieval, and agent-loop overhead; they are not embedding-only latency.

Backend	Chat model	Embedding	Personal	Cross-session	Project	GitHub
OMLX	`Qwen3-4B`	`Qwen3-Embedding-0.6B`	92% / 193s	100% / 164s	65% / 298s	73% / 385s
Ollama	`qwen3:4b`	`qwen3-embedding:0.6b`	96% / 180s	100% / 120s	94% / 208s	97% / 298s
MLC-LLM	`qwen3-4b`	`Qwen3-Embedding-0.6B-q0f16-MLC`	100% / 94s	100% / 76s	94% / 138s	88% / 156s

MLC used a real compiled model artifact with no symlink or temporary alias. OMLX is recorded through the local OpenAI-compatible wrapper until it is wired into the shared smoke backend registry.

maintenance memory smoke

Runtime storage, reflection bounds, and smoke verification

2026-04-29 - PRs #8 to #20

The current runtime baseline moved from "can run" to "can be verified". The main changes tightened persistence, bounded reflection context, validated onboarding, and documented the minimum smoke path.

SQLite history and long-term memory now use async aiosqlite persistence with lazy connection setup and write serialization.
Working memory keeps only the latest compressed reflection; oversized reflection state forces decompose.
Onboarding probes configured chat and embedding providers before saving config.
Executor boundary tests cover absolute paths and symlink escapes outside the workspace.
Chat moved into the internal agent namespace.
Quickstart now documents direct CLI, gateway, session reuse, no-embedding, and backend smoke commands.
Pytest-discoverable local backend smoke checks were added under tests/smoke/.

track 1 agent backend

Online core loop foundation

2026-03-29 - issue #3

Track 1 established the first usable online agent loop: retrieval, tool resolution, planning, execution, memory writeback, and response delivery through direct CLI and gateway surfaces.

Backend registry with local adapters for Ollama, MLC, vLLM, and SGLang, plus cloud adapters for OpenAI, Anthropic, Gemini, and NVIDIA.
Split chat_backend and embedding_backend configuration.
Attempt-based agent loop with structured reflection and utility-gated continuation.
Ollama native chat provider with thinking/answer split.
Data-driven smoke benchmark reports for backend comparison.

foundation architecture

Initial architecture split

2026-03-25

Split the runtime into melo/ and infrastructure into support/.
Introduced provider contracts to reduce coupling between core runtime and backend implementations.
Established memory and sleep package boundaries for later personalization work.
Expanded early regression coverage around the initial architecture.

Current roadmap posture

Area	Status	Next useful work
Online core loop	verified baseline	Continue gateway lifecycle coverage and keep backend smoke evidence current.
Memory system	active design	Implement working-memory stages, Hippo's five-action controller, and metadata-typed long/profile memory.
Deployment shell	planned	Define product shell responsibilities outside the core runtime.
Sleep pipeline	planned	Write the design issue before implementing training/evaluation.
Quality / release	ongoing	Keep docs, tests, smoke, and issue hygiene aligned with each track.

Track 2 deliverables

Deliverable	Status
Document memory env vars, file layout, retention semantics, and reset workflow.	done in v1
Add multi-session SQLite stress coverage.	done in v1
Run and record real-backend long-term retrieval smoke for OMLX, Ollama, and MLC-LLM.	archived
Freeze and document the `PersonalizedMemory` selected-sample schema for v1.	done in v1
Implement working-memory `fresh` / `rehearsed` stages.	next
Implement Hippo five-action routing and metadata-typed long/profile memory.	next