❯ /deploy — six targets

Pick where it runs.

Library inference needs no GPU. Library training needs a GPU once per skill. Six deployment targets — pick the one that matches the scenario.

vast.ai

cheapest

$0.08–$0.15 / hr· Cost-sensitive benchmarks, ad-hoc GPU rentals

./packaging/scripts/vast_run.sh tests
./packaging/scripts/vast_run.sh bench3
./packaging/scripts/vast_run.sh humaneval Qwen/Qwen3.5-1.5B lib.json

The April 18, 2026 verification run: $0.16 total cost.

Modal

zero ops

$0.60–$3.75 / hr· CI jobs, serverless-style fire-and-forget

pip install modal && modal setup
modal run packaging/modal_run.py::run_bench3
modal run packaging/modal_run.py::run_humaneval --model X --library Y

RunPod

spot pricing

$0.20–$1.60 / hr· Long training sweeps, enterprise support

Provision a pod, SSH in, run the standard scripts:

git clone <mirror>/nCPU
pip install torch transformers datasets pytest
pytest tests/self_optimizing/ -q

Local Apple Silicon

free

$0· Day-to-day development, MPS profiling

python3 -m pytest tests/self_optimizing/ -q
python3 -m demos.npcot_scale_practicality
python3 -m benchmarks.benchmark_npcot_library --device mps

Serverless

library inference only

$0 (cold) → tiny· Production API, GPU-free autoscaling

The 475 KB standalone binary ships as a Lambda custom runtime. Cold start ~1 ms, warm consult ~4 ns.

Browser (WASM)

client-side

$0· Private-by-default inference, offline tools

import init, { NpcotRuntime } from './npcot_wasm.js'
await init()
const lib = await fetch('/library.json').then(r => r.text())
const rt = new NpcotRuntime(lib)
rt.consult(hidden, array, length)

The 130 KB WASM binary loads faster than most page analytics scripts. It powers the live demo on this site.

Decision matrix

Scenario	Pick
Day-to-day Mac dev	Local Apple Silicon
One-off benchmark, tight budget	vast.ai
CI / automated GPU validation	Modal
Long training sweep	RunPod
Production library-inference API	Serverless
Ship to end users' browsers	WASM