Deploy
Library inference needs no GPU. Library training needs a GPU once per skill. Six deployment targets, pick the one that matches the scenario.
1. vast.ai (cheapest)
$0.08–$0.15 / hr· Cost-sensitive benchmarks, ad-hoc GPU rentals
./packaging/scripts/vast_run.sh tests ./packaging/scripts/vast_run.sh bench3 ./packaging/scripts/vast_run.sh humaneval Qwen/Qwen3.5-1.5B lib.json
Our April 18 2026 verification run: $0.16 total cost.
2. Modal (zero ops)
$0.60–$3.75 / hr· CI jobs, serverless-style fire-and-forget
pip install modal && modal setup modal run packaging/modal_run.py::run_bench3 modal run packaging/modal_run.py::run_humaneval --model X --library Y
3. RunPod (spot pricing)
$0.20–$1.60 / hr· Long training sweeps, enterprise support
Provision a pod, SSH in, run the standard scripts:
git clone <mirror>/nCPU pip install torch transformers datasets pytest pytest tests/self_optimizing/ -q
4. Local Apple Silicon
$0· Day-to-day development, MPS profiling
python3 -m pytest tests/self_optimizing/ -q python3 -m demos.npcot_scale_practicality python3 -m benchmarks.benchmark_npcot_library --device mps
5. Serverless (library inference only)
$0 (cold) → tiny· Production API, GPU-free autoscaling
The 475 KB standalone binary ships as a Lambda custom runtime. Cold start ~1 ms, warm consult ~4 ns.
6. Browser (WASM)
$0 client-side· Private-by-default inference, offline tools
import init, { NpcotRuntime } from './npcot_wasm.js'
await init()
const lib = await fetch('/library.json').then(r => r.text())
const rt = new NpcotRuntime(lib)
rt.consult(hidden, array, length)The 130 KB WASM binary loads faster than most page analytics scripts.
Decision matrix
| Scenario | Pick |
|---|---|
| Day-to-day Mac dev | Option 4 local |
| One-off benchmark, tight budget | Option 1 vast.ai |
| CI / automated GPU validation | Option 2 Modal |
| Long training sweep | Option 3 RunPod |
| Production library-inference API | Option 5 serverless |
| Ship to end users' browsers | Option 6 WASM |