Para

ParaBun

The Para Runtime — one of three pieces of the Para ecosystem (Lib + Lang + Runtime). A fork of Bun that bundles Para Lib, parses Para Lang natively, and adds native modules that don't exist outside the runtime: parabun:gpu (libcuda + Metal FFI), parabun:camera (V4L2), parabun:audio (ALSA + codecs + DSP), parabun:image (codecs + resize / blur / sharpen), parabun:gpio / parabun:i2c / parabun:spi (Linux peripheral I/O), and parabun:llm (Llama 3, Qwen2, BERT, Whisper, OpenAI-compatible HTTP).

Native modules are linked into the binary — no node-gyp, no CUDA toolkit install, no per-platform native addon distribution. Plain .ts / .js files behave the same as upstream Bun.

Linux SBCs (Raspberry Pi 5, Jetson Orin), NUCs, and other devices running a real OS. Not microcontrollers — JavaScriptCore alone exceeds an MCU's flash budget.

Para Lib — same packages, cross-runtime, zero native deps. Para Lang — the .pts syntax, compiled to JS by any bundler.

$curl -fsSL https://raw.githubusercontent.com/airgap/parabun/main/install.sh | bash

Linux and macOS. Windows build is in progress. parabun self-update refreshes an existing install along with the VS Code extension.

$curl -fsSL https://raw.githubusercontent.com/airgap/parabun/main/install-extension.sh | bash

Installs the VS Code extension into any of code, cursor, or kiro found on $PATH. The extension provides the .pts / .pjs TextMate grammar and an LSP with hover, go-to-definition, purity diagnostics, memo hints, and operator documentation.

Module index

Grouped by what each module needs from the runtime. The first three groups are part of ParaBun. The fourth is bundled here for convenience and also lives as cross-runtime npm packages — see Para Lib.

GPU compute (FFI to libcuda / Metal)
AI & inference (composes the above)

Runtime modules

parabun:gpu

Metal on macOS, CUDA on Linux and Windows, CPU fallback on hosts without a GPU. A matrix passed to gpu.hold() stays resident across matVec calls, so only the input vector crosses the host↔device boundary per call. Pure Float32ArrayFloat32Array functions are runtime-compiled to PTX (via NVRTC) or MSL (via newLibraryWithSource:) when the body fits a supported shape: arithmetic, ternary, Math.*.

typescript
import gpu from "parabun:gpu";

const M = 1024, K = 768;
const mat = gpu.alloc(M * K, "f32");
for (let i = 0; i < mat.length; i++) mat[i] = Math.random();

const held = gpu.hold(mat);                          // uploaded once
const queries = [
  new Float32Array(K).fill(0.1),
  new Float32Array(K).fill(0.2),
];
for (const q of queries) {
  const scores = gpu.matVec(held, q, M, K);        // no copy
  console.log("top score:", Math.max(...scores));
}
gpu.release(held);

Beyond matVec / simdMap, parabun:gpu ships conv2D, scan, reduce, argMin / argMax, histogram, variance, and median / quantile. CUDA device kernels back all of these (bitonic-sort path for quantile, two-stage Hillis-Steele for scan, atomic-privatized for histogram); CPU paths run when CUDA is unavailable. Metal mirror is the next batch.

parabun:image

A Sharp-class image module baked into the runtime — JPEG / PNG / WebP decode and encode (libjpeg-turbo, libpng, libwebp + libsharpyuv vendored statically), bilinear and Lanczos resize, separable Gaussian blur, unsharp-mask sharpen, Sobel edge-detect, 90 / 180 / 270 rotate, flip, crop, brightness / contrast / saturation adjust, threshold, invert, grayscale, per-channel histogram, and Porter-Duff source-over alpha compositing. No npm install sharp, no Node-ABI-versioned binary distribution.

typescript
import image from "parabun:image";

const bytes = await Bun.file("photo.jpg").bytes();
const img = image.decode(bytes);
const small = image.resize(img, { width: 800, height: 600, kernel: "lanczos" });
const sharp = image.sharpen(small, { amount: 1.5 });
const webp = image.encode(sharp, { format: "webp", quality: 85 });
await Bun.write("photo.webp", webp);

parabun:audio

A from-scratch audio toolkit: WAV / MP3 decode, Opus encode and decode (libopus 1.6.1), rnnoise-based denoiser, FFT, RBJ Audio EQ Cookbook biquads (lowpass / highpass / bandpass / notch), resample, STFT spectrogram, mel spectrogram (Whisper-mode included for STT pipelines), voice-activity detection, AGC, peak / RMS / windowed envelope, mix, normalize, interleave / deinterleave, and PCM type conversion. Heavy codecs (libopus, minimp3, rnnoise) ship statically.

OS audio I/O is wired on Linux: audio.devices() enumerates ALSA capture and playback devices, audio.capture({ device, sampleRate, channels }) returns a stream whose .frames() async-iterator yields Float32Array PCM straight from snd_pcm_readi, and audio.play({ ... }).write(samples) pushes PCM through snd_pcm_writei. The capture stream exposes reactive peakLevel and active Signals — RMS rate-limited to 10 Hz so a level meter is one effect() away. CoreAudio + WASAPI mount on the same surface in follow-ups.

typescript
import audio from "parabun:audio";
import rtp from "para:rtp";

await using mic = await audio.capture({ sampleRate: 48000, channels: 1 });
const enc = new audio.OpusEncoder({ sampleRate: 48000, channels: 1, application: "voip" });
const den = new audio.Denoiser();
const agc = new audio.Gain({ targetLevel: 0.1 });

const ssrc = 0x12345678;
let sequence = 0, timestamp = 0;

for await (const frame of mic.frames()) {
  den.process(frame.samples);                                       // suppress noise (in place)
  agc.process(frame.samples);                                       // normalize loudness
  const opus = enc.encode(frame.samples);
  const packet = rtp.pack({ payloadType: 111, sequence: sequence++, timestamp, ssrc, payload: opus });
  // send `packet` over your transport (UDP, WebRTC, …)
  timestamp += frame.samples.length;
}

parabun:camera

V4L2 capture on Linux. camera.devices() reads /sys/class/video4linux/ and runs VIDIOC_QUERYCAP on each to filter to actual capture devices. camera.formats(path) enumerates the supported (format, width, height, fps) tuples. camera.open(...) mmaps the kernel ring buffer and starts streaming, and cam.frames() is an async iterator of frames. AVFoundation (macOS) and Media Foundation (Windows) backends are planned on the same JS surface.

parabun:video

Scaffold only — the JS surface is in place (video.probe, video.decode, video.encode, video.decodeAll, with codec / container / acceleration options) but the native side hasn't been wired yet. The plan is libavcodec on desktop, V4L2 M2M on Pi 5, NVDEC/NVENC on Jetson, all behind the same JS API.

parabun:gpio

Linux gpiochip uAPI v2 over /dev/gpiochip* — the modern chardev interface, not legacy sysfs. gpio.open(chip, line, opts) returns a Line with read() / write() / async-iterable edge events. Validated end-to-end on a Raspberry Pi 5. Pairs with para:signalsline.value is a reactive Signal that updates on read and on edge events.

parabun:i2c

i2c-dev wrapper with full SMBus quick / byte / word / block transactions plus a raw readWriteRaw() for chips that need 16-bit register addressing or arbitrary frame layouts. i2c.scan(bus) probes addresses, i2c.open(bus, addr) hands back a typed device handle. Works on any Linux board with /dev/i2c-*.

parabun:spi

spidev wrapper with both half-duplex (read / write) and full-duplex (transfer) plus multi-segment (transferSegments) for mode / speed / CS-hold changes mid-burst. Per-device mode / bitsPerWord / maxSpeedHz / lsbFirst are sticky and apply to every transfer until changed.

parabun:llm

An in-tree native inference stack covering three model classes: Llama / Qwen2 chat + completion (LLM), BERT-family sentence embedders (Encoder), and Whisper STT (WhisperModel). Weights mmap off disk; residual stream and KV cache live on-device. Per-token traffic across PCIe is a 4-byte argmax. Q4_K and Q6_K matVec kernels use a 1-warp-per-row, 4-warps-per-block layout; QKV and Gate+Up projections are byte-concatenated at load time and dispatched as one matVec per layer.

typescript
import llm from "parabun:llm";

using m = await llm.LLM.load("./Llama-3.2-1B-Instruct-Q4_K_M.gguf");

for await (const piece of m.chat([
  { role: "system", content: "You are helpful and concise." },
  { role: "user", content: "What is the capital of France?" },
])) {
  process.stdout.write(piece);
}
Llama-3.2-1B Q4_K_M · RTX 4070 Ti parabun ollama
greedy decode (device-only) 340 tok/s ~350 tok/s
greedy decode (logits DtoH) 275 tok/s
prompt prefill 295 tok/s

Numbers are within run-to-run noise of ollama on this model and hardware. Chat templates for Llama-3, ChatML, and Mistral-Instruct are detected from the GGUF's tokenizer.chat_template. Only the CUDA backend is wired in this module today; Metal kernels are pending.

llm.serve({ engine, modelId, port }) exposes any model (or anything else implementing .chat() / .generate() / .embed()) over an OpenAI-compatible HTTP API. Routes: GET /v1/models, POST /v1/chat/completions (sync and SSE streaming), POST /v1/completions, POST /v1/embeddings. Optional bearer auth and a FIFO concurrency gate (default 1). Default port is 11434, matching ollama's, so OpenAI clients that auto-discover a local ollama work unchanged.

WhisperModel loads whisper.cpp ggml-*.bin files (F32 / F16 / Q4_0 / Q5_0 / Q5_1 / Q8_0) and runs encoder-decoder STT — KV cache, chunked long-audio, beam search, language detection across all 99 Whisper languages. CUDA-accelerated end-to-end (encoder im2col conv + matmuls + per-head batched attention; decoder per-token matVecs + LM head). On an RTX 4070 Ti, an 11 s JFK clip transcribes in 1.6 s with tiny.en — about 6.9× real-time.

Both LLM and WhisperModel instances expose reactive para:signals Signals: m.busy (refcounted, flips while a chat / generate / embed / transcribe call is in flight) and m.device ("cuda" | "metal" | "cpu", stable for the life of the instance). One effect() reads them straight into UI state — no manual notify.

parabun:vision

vision.frames(stream, { decodeMjpg? }) takes a frame iterator from parabun:camera (or any source yielding the same shape) and yields packed-RGBA8 frames. yuyv, nv12, and rgb24 are converted inline; mjpeg requires the caller to pass image.decode from parabun:image (cross-builtin imports between bun: modules aren't supported, so dependencies are passed in at the call site). vision.detectMotion adds a downsampled-luma frame-diff estimator with temporal smoothing, plus opt-in per-frame region segmentation: pass { regions: { minPixels } } and each MotionFrame carries a regions array of bounding boxes (4-connected, two-pass union-find on the mask, scaled back to source-frame coordinates) so callers can route on where motion is, not just whether.

vision.recognize(frame, { engine: "tesseract", language: "eng" }) drives Tesseract through libtesseract.so.5 FFI (system-installed — apt install libtesseract-dev tesseract-ocr-eng / brew install tesseract), iterates word-level results, and returns one Detection per word with confidence in [0, 1] and bbox in source-frame pixels. The "easyocr" engine is reserved.

vision.onnx(modelPath) returns an ONNX session bound to a system-installed libonnxruntime — the C ABI doesn't export flat symbols, so the FFI dlopens OrtGetApiBase(), walks the resulting OrtApi struct of ~150 function pointers at version-pinned offsets, and rebinds the ones we need via linkSymbols. session.run({ name: { data, shape } }) takes Float32Array tensors, runs inference, and returns a Map of output tensors. Install: brew install onnxruntime / apt install libonnxruntime-dev, or download a release tarball and point PARABUN_ONNX_LIB at the .so.

vision.detect(frame, { engine: "yolo", model }) drives YOLOv8/YOLOv11 ONNX models on top of vision.onnx: bilinear-letterbox preprocess to 640×640 CHW float32 in [0, 1], one inference pass, channel-major decode of the [1, 4+nc, anchors] output, greedy IoU NMS (same-class), and coordinate unmapping back to source-frame pixels. Returns Detection[] with COCO labels by default; pass opts.classes for custom-trained models. Session is LRU-cached per model path so repeat calls skip the dominant ONNX load cost. SSD and RT-DETR engines are reserved.

vision.track() instantiates a stateful tracker that turns per-frame Detection[] into stable Track[] across the stream — each contiguous occurrence of the same object keeps the same monotonic id so callers can draw consistent labels, accumulate trajectories, and route effects per-object instead of per-frame. SORT-style greedy IoU matching with same-class gating; tracks coast for maxFramesMissed frames to ride out brief occlusions without yo-yoing ids; trajectoryLength retains the last N bboxes per track when drawing trails.

parabun:speech

speech.listen(stream, { sampleRate }) takes an audio chunk iterator (parabun:audio's capture stream, a file reader, anything yielding { samples }) and yields one utterance per detected speech burst. The classifier is RMS-against-an-adaptive-noise-floor, with pre-roll to catch word onsets, hangover to seal on silence, and a minimum-length filter to drop clicks and breath sounds.

speech.transcribe(utt, { engine: "whisper", model }) dispatches to the WhisperModel in parabun:llm, with a per-process model cache so the weights aren't reloaded between calls. speech.speak(text, { engine: "piper", model }) drives the Piper voice synthesizer (subprocess in v1, libpiper FFI v2 tracked) and returns f32 mono PCM at the voice's native sample rate, ready to hand straight to audio.play().write(). The listen stream also exposes reactive active / noiseFloor / lastUtterance signals.

parabun:assistant

The 3-line case. Composes parabun:audio (mic + speaker), parabun:speech (VAD + STT + TTS), and parabun:llm (Llama / Qwen2 inference) into a complete on-device voice loop: await using bot = await assistant.create({ llm, stt, tts, system });, then await bot.run(). Mic captures, VAD gates, Whisper transcribes, the LLM generates, Piper synthesizes, ALSA plays — fully local, no cloud round-trip. bot.turns() exposes the loop as an async iterator; bot.ask(text) skips STT for text-only turns; bot.say(text) pushes a proactive utterance.

Reactive surface: bot.state ("idle" | "listening" | "thinking" | "speaking"), bot.history, bot.lastTurn, and bot.interrupted are all para:signals Signals — wire them straight into UI without polling. Persistent memory is one option away: pass memory: "/path/to/memory.sqlite" and the conversation transcript replays into history on every create. Power users keep their seat — bot.llm exposes the underlying model, so anything reachable directly via parabun:llm / parabun:speech / parabun:audio is reachable through bot too.

Example: LangChain VectorStore

ParaBunVectorStore extends VectorStore from @langchain/core and implements the addVectors and similaritySearchVectorWithScore methods, so call sites that accept any VectorStore work against it without changes. The shared setup below feeds both snippets:

setup (shared)
import { OpenAIEmbeddings } from "@langchain/openai";

const emb = new OpenAIEmbeddings({ modelName: "text-embedding-3-small" });
const docs = ["hello world", "good morning", "see you later"];
const vectors = await emb.embedDocuments(docs);
const q = await emb.embedQuery("greetings");
before
import { MemoryVectorStore }
  from "langchain/vectorstores/memory";

const store = new MemoryVectorStore(emb);
await store.addVectors(vectors, docs);
const hits = await store
  .similaritySearchVectorWithScore(q, 10);
after
import { ParaBunVectorStore }
  from "./parabun-store.pjs";

const store = new ParaBunVectorStore(emb);
await store.addVectors(vectors, docs);
const hits = await store
  .similaritySearchVectorWithScore(q, 10);
100k × 384 f32, top-10 add_ms score_ms vs LangChain
LangChain MemoryVectorStore 4.0 48.2 1.00×
ParaBunVectorStore 82.7 15.9 2.83×

add_ms is higher because rows are packed into a single SAB Float32Array and normalized in place — one-time O(N·D) work amortized across subsequent queries. Top-K indices and scores match LangChain's to four decimal places.

Composition examples

Programs that wire multiple parabun:* modules together in one process. No postMessage, no child processes, no N-API bindings.

Voice → LLM → tool dispatch

Mic captures, Whisper transcribes, the LLM picks a tool under a JSON schema (mathematically guaranteed-valid output, no parse retries), the runtime dispatches it, Piper speaks the reply. The dispatch table is plain JS below; para:mcp (now shipped — stdio + WebSocket transports) lets you swap the table for a Model Context Protocol client without changing this control flow. effect() over mic.peakLevel / chat.busy / wsp.busy drives a status line — no polling, no observers, no event emitters.

import assistant from "parabun:assistant";
import { effect } from "para:signals";

await using bot = await assistant.create({
  llm: "./Llama-3.2-1B-Instruct-Q4_K_M.gguf",
  stt: "./ggml-tiny.en.bin",
  tts: "./en_US-lessac-medium.onnx",
  system: "You are a helpful home assistant. Keep replies short.",
  tools: {
    setLight:  ({ room, on, brightness }) => console.log(`light ${room} ${on ? "on" : "off"} @ ${brightness}`),
    playMusic: ({ track }) => console.log(`play ${track}`),
  },
});

// One reactive line redraws a live status line in place.
effect(() => process.stdout.write(`\r${bot.state.get()}`));

await bot.run();   // VAD → STT → LLM (with grammar-constrained tool calls) → TTS → speaker

parabun:assistant composes parabun:audio + parabun:speech + parabun:llm + para:signals internally. bot.state is a Signal that cycles through "idle" | "listening" | "thinking" | "speaking"; the tools dispatch table works the same as an para:mcp client. What's underneath shows the same loop hand-rolled — same five modules, no facade.

Webcam motion → reactive assistant

Module composition: parabun:camera + parabun:vision + parabun:assistant over para:signals. vision.detectMotion emits frame-by-frame scores; the rising-edge handler fires once on each false→true transition of the predicate — no derived() wrapper, no state machine, no debounce timer, no wasPresent flag. The ParaBun tab uses a when EXPR { … } block; the TypeScript tab uses the equivalent signals.when() call.

import camera    from "parabun:camera";
import vision    from "parabun:vision";
import assistant from "parabun:assistant";
import signals   from "para:signals";

await using cam = await camera.open("/dev/video0", { format: "yuyv", width: 640, height: 480 });
await using bot = await assistant.create({
  llm: "/models/Llama-3.2-1B-Instruct-Q4_K_M.gguf",
  tts: "/models/en_US-lessac-medium.onnx",
  system: "You are a friendly home assistant. Keep replies short.",
});

// motion.score / motion.detected auto-fill as frames flow.
const motion = vision.detectMotion(vision.frames(cam.frames()), { sensitivity: 0.04 }).run();

// Greet whenever motion appears AND the bot is idle. The predicate is
// auto-tracked — every signal it reads becomes a dep, no derived() wrapper.
signals.when(
  () => motion.detected.get() && bot.state.get() === "idle",
  () => bot.say("Welcome back!"),
);

Barge-in is implemented inside parabun:assistant: a rising edge on vad.active drops the queued TTS via spk.stop() and stamps turn.interrupted. bot.interrupt() does the same programmatically.

What's underneath

The five-module loop parabun:assistant wraps, hand-written. Useful for a non-default flow — custom VAD threshold, separate transcribe + chat sessions, custom JSON dispatch. Longer than the facade.

import audio  from "parabun:audio";
import speech from "parabun:speech";
import llm    from "parabun:llm";
import { effect } from "para:signals";

const tools: Record<string, (args: any) => any> = {
  setLight:  ({ room, on, brightness }) => console.log(`light ${room} ${on ? "on" : "off"} @ ${brightness}`),
  playMusic: ({ track }) => console.log(`play ${track}`),
  reply:     ({ text }) => text,
};
const ToolSchema = { /* JSON schema with oneOf for each tool */ };

await using mic  = await audio.capture({ sampleRate: 16000, channels: 1 });
using       wsp  = await llm.WhisperModel.load("./ggml-tiny.en.bin");
using       chat = await llm.LLM.load("./Llama-3.2-1B-Instruct-Q4_K_M.gguf");

effect(() => {
  process.stdout.write(`\rmic ${mic.peakLevel.get().toFixed(3)}  llm ${chat.busy.get() ? "🤔" : ""}  whisper ${wsp.busy.get() ? "🎙️" : ""}`);
});

for await (const utt of speech.listen(mic.frames(), { sampleRate: 16000 })) {
  const heard = wsp.transcribe(utt.samples, { language: "en" });
  const { tool, args } = await chat.chatJSON([{ role: "user", content: heard }], { schema: ToolSchema, maxTokens: 80 });
  const result = await tools[tool](args);
  if (typeof result === "string") {
    await speech.say(result, { engine: "piper", model: "./en_US-lessac-medium.onnx" });
  }
}

chat.chatJSON({ schema }) drains the streamed grammar-constrained chat and parses the result in one call; speech.say(text) wraps speak() + audio.play() + spk.write() with a process-wide cached PlaybackStream keyed on (sampleRate, channels). Both ergonomic shortcuts shipped alongside the assistant facade.

Roadmap

Each module is gated by a compile-time feature flag. The configurator generates a bun build --compile invocation with only the flags you select.

Status Module What it does
shipped parabun:image JPEG / PNG / WebP decode + encode, resize (bilinear / Lanczos), blur / sharpen / edge-detect, rotate / flip / crop, adjust / threshold / invert / grayscale, histogram, alpha composite.
shipped parabun:audio WAV / MP3 / Opus codecs, RBJ biquads, FFT, resample, spectrogram, VAD, denoiser (rnnoise), AGC, mix / normalize / envelope, planar ⇄ frame-major + i16 ⇄ f32 PCM helpers.
shipped parabun:gpu primitives conv2D, scan, reduce, argMin / argMax, histogram, median / quantile. CPU correctness paths today; CUDA / Metal hooks slot in via the existing dispatch.
shipped parabun:camera V4L2 capture on Linux — devices(), formats(path), open(...) with an async-iterator frames() over kernel-mmapped buffers. AVFoundation + Media Foundation follow on the same surface.
shipped OS audio I/O Live ALSA capture + playback for parabun:audio. devices() / capture(...) / play(...) with Float32 PCM streams, S16_LE on the wire. CoreAudio + WASAPI follow.
shipped parabun:gpio / parabun:i2c / parabun:spi Linux peripheral I/O — uAPI v2 GPIO chardev (with edge events), i2c-dev SMBus, spidev half + full duplex with multi-segment transfers. Validated end-to-end on Raspberry Pi 5.
shipped parabun:gpu device kernels CUDA: reduce (sum / min / max), atomic-privatized histogram, recursive multi-level scan (Hillis-Steele leaf, no element-count cap), argMin / argMax, variance, conv2D, bitonic-sort median / quantile. Metal mirror complete: same set ported to MSL — simdgroup tree-reduce (simd_sum / simd_shuffle_xor), 1024-bin atomic-privatized histogram, recursive Hillis-Steele scan, two-pass variance, bitonic-step quantile. On-device validation pending mac-mini run; Linux probe stays inert with CPU fallback.
shipped parabun:vision Frame stream, frame-diff motion detection (vision.frames, vision.detectMotion), per-frame connected-components region segmentation, OCR via vision.recognize (Tesseract / libtesseract.so.5 FFI), generic ONNX inference via vision.onnx(modelPath).run({…}) (libonnxruntime FFI; struct-walked OrtApi), object detection via vision.detect (YOLOv8/YOLOv11 — bilinear letterbox preprocess, channel-major decode, IoU NMS, COCO 80-class default), and multi-frame tracking via vision.track() (SORT-style greedy IoU matching with same-class gating + maxFramesMissed coasting → stable per-object ids across the stream). SSD / RT-DETR engines stubbed for follow-up.
shipped parabun:speech VAD-gated speech.listen (with reactive active / noiseFloor / lastUtterance signals), Whisper STT (speech.transcribe, dispatching to parabun:llm's WhisperModel — encoder-decoder, KV cache, beam search, language detection, CUDA-accelerated end-to-end), and Piper TTS (speech.speak — subprocess in v1; libpiper FFI v2 tracked).
shipped parabun:assistant Three-line voice-assistant facade composing parabun:audio + parabun:speech + parabun:llm + para:mcp. bot.run / turns / ask / say / interrupt + reactive state / history / lastTurn / interrupted / toolsActive signals + sqlite-backed persistent memory + tool dispatch (inline + MCP) + VAD-driven barge-in + wake word (wakeWord: "hey jetson") + cron-driven scheduled prompts + RAG (knowledge: { dir, encoder }). Vision (VLM) turns deferred to follow-up.
in progress parabun:video JS surface scaffolded; libavcodec / V4L2 M2M / NVDEC native binding lands with hardware bring-up. Decode + encode + container muxing.
planned parabun:image AVIF AVIF decode/encode (libavif + AOM / dav1d vendor add). Rounds out the codec coverage matrix.
HTTP handlers, JSON parsing, and ordinary application code use the same code paths as upstream Bun. The added modules don't change behavior or performance there.