Project updates
Development changelog and roadmap progress.
The changelog focuses on review-relevant changes: runtime behavior, public boundaries, verification coverage, and roadmap state.
Track 2 memory-system design evolution
2026-05-03 - design notes .melo/track2_plan.md
Track 2 now has a clearer memory-session shape. The online path splits remembered candidates into a fast working-memory branch and a Hippo branch that owns durable or specialized placement.
- The agent keeps the v1 write decision lightweight: it only proposes that a candidate is worth remembering.
workingstores short-lived active task/session context with LRU-style refresh; the design direction separatesfreshandrehearsedmemories.- Long-term persistence waits until the related working memory is
rehearsedor another strong write signal exists. Hipporoutes durable writes with five explicit actions:NEW,UPDATE,RELATED,CONFLICT, andIGNORE.longandprofileshare one physical long-term store in v1, distinguished by metadata kinds such as semantic, episodic, profile, and reflection.toolsremains the authoritative registry and grows procedural skill/template memory without bypassing checker or executor policy.PersonalizedMemoryis a filtered sleep-sample staging area for Track 4, not a dump of raw history.historystays append-only or near-append-only for debugging, replay, auditability, and provenance lookup outside normal recall.
The v1 foundation now has persistent-memory docs, multi-session
SQLite stress coverage, real-backend long-term retrieval evidence,
and a frozen selected-sample schema for
PersonalizedMemory. Later Track 2 work moves into
working-memory stages, Hippo write routing, long/profile semantics,
procedural memory, and evaluation.
Real-backend memory smoke archived
2026-05-10 - issue #4
OMLX, Ollama, and MLC-LLM were tested against the same Qwen3 0.6B
embedding family for persistent long-term retrieval. The durable
archive lives under
tests_result/backend-smoke/2026-05-10-qwen3-embedding-0.6b/;
transient runner output remains separate.
The times below are end-to-end scenario wall-clock times. They include chat generation, embedding calls, persistence/retrieval, and agent-loop overhead; they are not embedding-only latency.
| Backend | Chat model | Embedding | Personal | Cross-session | Project | GitHub |
|---|---|---|---|---|---|---|
| OMLX | Qwen3-4B | Qwen3-Embedding-0.6B | 92% / 193s | 100% / 164s | 65% / 298s | 73% / 385s |
| Ollama | qwen3:4b | qwen3-embedding:0.6b | 96% / 180s | 100% / 120s | 94% / 208s | 97% / 298s |
| MLC-LLM | qwen3-4b | Qwen3-Embedding-0.6B-q0f16-MLC | 100% / 94s | 100% / 76s | 94% / 138s | 88% / 156s |
MLC used a real compiled model artifact with no symlink or temporary alias. OMLX is recorded through the local OpenAI-compatible wrapper until it is wired into the shared smoke backend registry.
Runtime storage, reflection bounds, and smoke verification
The current runtime baseline moved from "can run" to "can be verified". The main changes tightened persistence, bounded reflection context, validated onboarding, and documented the minimum smoke path.
- SQLite history and long-term memory now use async
aiosqlitepersistence with lazy connection setup and write serialization. - Working memory keeps only the latest compressed reflection; oversized reflection state forces
decompose. - Onboarding probes configured chat and embedding providers before saving config.
- Executor boundary tests cover absolute paths and symlink escapes outside the workspace.
Chatmoved into the internal agent namespace.- Quickstart now documents direct CLI, gateway, session reuse, no-embedding, and backend smoke commands.
- Pytest-discoverable local backend smoke checks were added under
tests/smoke/.
Online core loop foundation
2026-03-29 - issue #3
Track 1 established the first usable online agent loop: retrieval, tool resolution, planning, execution, memory writeback, and response delivery through direct CLI and gateway surfaces.
- Backend registry with local adapters for Ollama, MLC, vLLM, and SGLang, plus cloud adapters for OpenAI, Anthropic, Gemini, and NVIDIA.
- Split
chat_backendandembedding_backendconfiguration. - Attempt-based agent loop with structured reflection and utility-gated continuation.
- Ollama native chat provider with thinking/answer split.
- Data-driven smoke benchmark reports for backend comparison.
Initial architecture split
2026-03-25
- Split the runtime into
melo/and infrastructure intosupport/. - Introduced provider contracts to reduce coupling between core runtime and backend implementations.
- Established memory and sleep package boundaries for later personalization work.
- Expanded early regression coverage around the initial architecture.
Current roadmap posture
| Area | Status | Next useful work |
|---|---|---|
| Online core loop | verified baseline | Continue gateway lifecycle coverage and keep backend smoke evidence current. |
| Memory system | active design | Implement working-memory stages, Hippo's five-action controller, and metadata-typed long/profile memory. |
| Deployment shell | planned | Define product shell responsibilities outside the core runtime. |
| Sleep pipeline | planned | Write the design issue before implementing training/evaluation. |
| Quality / release | ongoing | Keep docs, tests, smoke, and issue hygiene aligned with each track. |
Track 2 deliverables
| Deliverable | Status |
|---|---|
| Document memory env vars, file layout, retention semantics, and reset workflow. | done in v1 |
| Add multi-session SQLite stress coverage. | done in v1 |
| Run and record real-backend long-term retrieval smoke for OMLX, Ollama, and MLC-LLM. | archived |
Freeze and document the PersonalizedMemory selected-sample schema for v1. | done in v1 |
Implement working-memory fresh / rehearsed stages. | next |
| Implement Hippo five-action routing and metadata-typed long/profile memory. | next |