The current system launches a dedicated VM per tenant, bundling frontend service, Agent process, toolchain, and persistent storage. This monolithic pattern causes:
资源碎片化 – 每个 VM 保留大量空闲资源,利用率常低于 15%
Resource fragmentation – each VM holds idle resources, utilization often below 15%
启动延迟 – 新用户需等待 VM 置备(5–15 分钟)
Startup delay – new users wait for VM provisioning (5–15 min)
成本高昂 – 免费用户占用完整 VM,单用户成本约 $8/月
High cost – free users occupy full VM, ~$8/month per user
⚠️ 风险警示⚠️ Warning若继续当前架构,每新增 1000 免费用户需额外 8 台 VM,年成本增加 ~$76,800,且扩展性受限于虚拟机配额。Continuing current architecture, every 1000 free users need 8 extra VMs, annual cost increase ~$76,800, scalability limited by VM quota.
Inspired by Anthropic's Brain/Hands/Session pattern, we split the system into three independent layers:
🧠 Brain
中央无状态调度器,负责任务规划、路由与编排。水平扩展,不保存状态。
Central stateless orchestrator for task planning, routing, and orchestration. Horizontally scalable, no state.
🖐️ Hands
临时沙箱执行环境,每个任务启动一个轻量容器,任务结束即销毁。
Ephemeral sandbox execution environment, one lightweight container per task, destroyed after task completion.
💾 Session
持久化共享存储层,保存用户数据、对话历史与工具配置。
Persistent shared storage layer for user data, conversation history, and tool configuration.
FIG 1: 三层解耦架构示意
FIG 1: Three-layer decoupled architecture diagram
3. 架构对比
3. Architecture Comparison
维度
Dimension
旧架构 (单体 VM)
Old (Monolithic VM)
新架构 (三层解耦)
New (Three-Layer)
资源利用率
Utilization
< 15%
> 70%
新用户上线
Onboarding
5–15 分钟
~30 秒
免费用户成本
Free user cost
~$8/月
~$0.01/月
扩展方式
Scaling
垂直 + 新增 VM
水平扩展 Brain/Hands
FIG 2: 上线时间与成本对比
FIG 2: Onboarding time & cost comparison
4. 产品分层:Tier 1 / Tier 2
4. Product Tiers: Tier 1 / Tier 2
能力
Capability
Tier 1 (免费)
Tier 2 (专业)
Brain 调度
Brain orchestration
共享队列
专用实例
Hands 沙箱
Hands sandbox
最大 5 并发
无限并发
Session 存储
Session storage
100 MB
10 GB
成本/月
Cost/month
~$0.01
~$15
5. 迁移路径 (P0–P4)
5. Migration Path (P0–P4)
Phase
交付物
Deliverable
时间
Timeline
P0
Session 存储层独立 + 数据迁移
Session storage extraction + data migration
2 周
P1
Brain 无状态调度器 + 任务队列
Brain stateless scheduler + task queue
3 周
P2
Hands 沙箱执行环境 (容器化)
Hands sandbox execution (containerized)
4 周
P3
Tier 1 用户切换至新架构
Tier 1 users migrated to new architecture
2 周
P4
Tier 2 用户迁移 + 旧 VM 下线
Tier 2 migration + decommission old VMs
3 周
📖 参考📖 Reference迁移策略参考了 "Strangler Fig Pattern" 渐进式重构,确保旧系统在迁移期间持续可用。Migration strategy references the Strangler Fig Pattern for incremental refactoring, ensuring old system remains available during migration.
6. 成本详细对比
6. Detailed Cost Comparison
项目
Item
旧架构
Old
新架构
New
1000 免费用户
1000 free users
$8,000/月
$10/月
100 专业用户
100 pro users
$3,000/月
$1,500/月
基础设施运维
Infra ops
2 全职 SRE
0.5 全职 SRE
FIG 3: 成本对比柱状图
FIG 3: Cost comparison bar chart
7. 风险与缓解措施
7. Risks & Mitigations
数据一致性 – Session 层使用分布式事务 + 最终一致性保证
Data consistency – distributed transactions + eventual consistency for Session layer
沙箱安全 – Hands 容器使用 seccomp + 资源配额隔离
Sandbox security – Hands containers use seccomp + resource quotas
三层解耦架构将 Crewlet 从单体 VM 模式转变为弹性、低成本、高可扩展的 AI agent 平台。P0 阶段将于下周启动,预计 14 周完成全部迁移。
The three-layer decoupled architecture transforms Crewlet from monolithic VM model to elastic, low-cost, highly scalable AI agent platform. P0 starts next week, full migration estimated at 14 weeks.
✅ 下一步行动✅ Next Actions1. 评审本提案 · 2. 确定 P0 团队 · 3. 启动 Session 层抽取1. Review this proposal · 2. Assign P0 team · 3. Start Session layer extraction