Deploy

Library inference needs no GPU. Library training needs a GPU once per skill. Six deployment targets, pick the one that matches the scenario.

1. vast.ai (cheapest)

$0.08–$0.15 / hr· Cost-sensitive benchmarks, ad-hoc GPU rentals
./packaging/scripts/vast_run.sh tests
./packaging/scripts/vast_run.sh bench3
./packaging/scripts/vast_run.sh humaneval Qwen/Qwen3.5-1.5B lib.json

Our April 18 2026 verification run: $0.16 total cost.

2. Modal (zero ops)

$0.60–$3.75 / hr· CI jobs, serverless-style fire-and-forget
pip install modal && modal setup
modal run packaging/modal_run.py::run_bench3
modal run packaging/modal_run.py::run_humaneval --model X --library Y

3. RunPod (spot pricing)

$0.20–$1.60 / hr· Long training sweeps, enterprise support

Provision a pod, SSH in, run the standard scripts:

git clone <mirror>/nCPU
pip install torch transformers datasets pytest
pytest tests/self_optimizing/ -q

4. Local Apple Silicon

$0· Day-to-day development, MPS profiling
python3 -m pytest tests/self_optimizing/ -q
python3 -m demos.npcot_scale_practicality
python3 -m benchmarks.benchmark_npcot_library --device mps

5. Serverless (library inference only)

$0 (cold) → tiny· Production API, GPU-free autoscaling

The 475 KB standalone binary ships as a Lambda custom runtime. Cold start ~1 ms, warm consult ~4 ns.

6. Browser (WASM)

$0 client-side· Private-by-default inference, offline tools
import init, { NpcotRuntime } from './npcot_wasm.js'
await init()
const lib = await fetch('/library.json').then(r => r.text())
const rt = new NpcotRuntime(lib)
rt.consult(hidden, array, length)

The 130 KB WASM binary loads faster than most page analytics scripts.

Decision matrix

ScenarioPick
Day-to-day Mac devOption 4 local
One-off benchmark, tight budgetOption 1 vast.ai
CI / automated GPU validationOption 2 Modal
Long training sweepOption 3 RunPod
Production library-inference APIOption 5 serverless
Ship to end users' browsersOption 6 WASM