localmelo updates

Development changelog and progress tracker

开发日志与进度追踪

feat refactor bench Task 1: Online Core Loop

2026-03-29 — localmelo/localmelo#3

A. Backend registry and infrastructure refactor

  • Pluggable backend registry with local (Ollama, MLC-LLM, vLLM, SGLang) and cloud (OpenAI, Anthropic, Gemini, Nvidia) adapters
  • Config supports split chat_backend / embedding_backend (e.g. cloud chat + local embedding)
  • Removed legacy support/serving/ and support/models/ modules
  • Tests reorganized from flat layout to domain-grouped subdirectories
  • Removed legacy raw-string Agent constructor

B. Attempt-based agent loop with structured reflection

  • Replaced flat step loop with attempt-based outer loop (MAX_ATTEMPTS=5, STEPS_PER_ATTEMPT=10, capped by MAX_AGENT_STEPS=30)
  • Extracted _run_attempt() for clean single-responsibility separation
  • Renamed ShortTermWorkingMemory with reflection entries that persist across attempts
  • Structured ReflectionEntry and ReflectionDecision with active-learning fields
  • Utility-based continuation gate: utility = info_gain * feasibility * novelty - cost - repeat_risk
  • Stuck detection: repeated tool calls or errors trigger early attempt termination
  • Reflection context injected into retrieval, planning, and reflect calls
  • Reflections promoted to long-term memory only at terminal task state
  • Hardened reflection parsing with strict type coercion and [0,1] float clamping

C. Ollama native provider, usage normalization, and smoke benchmark

  • Ollama native /api/chat provider with think: true
  • Thinking/answer split for both Ollama native and MLC <think> tags
  • Provider-boundary usage normalization with total_tokens backfill
  • Data-driven smoke benchmark with per-scenario metrics, normalized tokenizer comparison, multi-backend reports
  • --report-only mode for regenerating reports from existing JSON

Status

ComponentStatus
Backend registry (local + cloud)Done
Split chat/embedding configDone
Attempt-based agent loopDone
Structured reflection + utility gateDone
Working memory with reflection entriesDone
Stuck detectionDone
Ollama native chat providerDone
Smoke benchmark frameworkDone
637 tests passing (ruff, black, mypy, pytest)Done
Task decomposition (decompose action)Not yet
Sleep-mode pipelineNot yet
Long-memory promotion policiesNot yet
Utility threshold empirical tuningNot yet

A. 后端注册表与基础设施重构

  • 可插拔后端注册表,支持本地 (Ollama, MLC-LLM, vLLM, SGLang) 和云端 (OpenAI, Anthropic, Gemini, Nvidia) 适配器
  • Config 支持 chat_backend / embedding_backend 分离配置(如云端 chat + 本地 embedding)
  • 移除遗留的 support/serving/support/models/ 模块
  • 测试从扁平结构重组为按子系统分目录
  • 移除遗留的 Agent 原始字符串构造路径

B. 基于 Attempt 的 Agent 循环与结构化反思

  • 用 attempt-based 双层循环替代原来的单层 for 循环(MAX_ATTEMPTS=5STEPS_PER_ATTEMPT=10,总上限 MAX_AGENT_STEPS=30
  • 提取 _run_attempt() 方法,职责清晰
  • ShortTerm 更名为 WorkingMemory,反思条目跨 attempt 保留
  • 结构化 ReflectionEntryReflectionDecision,包含主动学习字段
  • 基于效用值的续行门控:utility = info_gain * feasibility * novelty - cost - repeat_risk
  • 卡住检测:相同工具调用或错误重复 3 次触发提前终止
  • 反思上下文注入到检索、规划和反思调用中
  • 反思条目仅在任务终态时推送到长期记忆
  • 严格类型强制转换和 [0,1] 浮点数钳位

C. Ollama 原生 Provider、Usage 规范化与 Smoke 基准框架

  • Ollama 原生 /api/chat provider,启用 think: true
  • 支持 Ollama 和 MLC <think> 标签的 thinking/answer 拆分
  • Provider 边界的 usage 规范化,支持 total_tokens 回填
  • 数据驱动的 smoke 基准框架:场景指标、统一 tokenizer 对比、多后端报告
  • --report-only 模式从已有 JSON 重新生成报告

状态

组件状态
后端注册表(本地 + 云端)完成
Chat/Embedding 分离配置完成
Attempt-based Agent 循环完成
结构化反思 + 效用门控完成
Working Memory + 反思条目完成
卡住检测完成
Ollama 原生 Chat Provider完成
Smoke 基准框架完成
637 测试通过(ruff, black, mypy, pytest)完成
任务分解(decompose 动作)待完成
Sleep 模式管线待完成
长期记忆推送策略待完成
效用阈值经验校准待完成

refactor Initial Architecture

2026-03-25
  • Split runtime and infrastructure into melo/ and support/
  • Introduced provider contracts to reduce coupling
  • Established memory and sleep-mode package boundaries
  • Added sleep module as foundation for continuous personalization
  • Cleaned up local serving paths
  • Expanded regression coverage
  • 运行时和基础设施拆分为 melo/support/
  • 引入 provider 契约以降低耦合
  • 建立 memory 和 sleep-mode 包边界
  • 添加 sleep 模块作为持续个性化的基础
  • 清理本地 serving 路径
  • 扩展回归测试覆盖