Para

ParaBun

The Para Runtime — one of three pieces of the Para ecosystem (Lib + Lang + Runtime). A fork of Bun that bundles Para Lib, parses Para Lang natively, and adds native modules that don't exist outside the runtime: parabun:gpu (libcuda + Metal FFI), parabun:camera (V4L2), parabun:audio (ALSA + codecs + DSP), parabun:image (codecs + resize / blur / sharpen), parabun:gpio / parabun:i2c / parabun:spi (Linux peripheral I/O), and parabun:llm (Llama 3, Qwen2, BERT, Whisper, OpenAI-compatible HTTP).

Native modules are linked into the binary — no node-gyp, no CUDA toolkit install, no per-platform native addon distribution. Plain .ts / .js files behave the same as upstream Bun.

Linux SBCs (Raspberry Pi 5, Jetson Orin), NUCs, and other devices running a real OS. Not microcontrollers — JavaScriptCore alone exceeds an MCU's flash budget.

Just want the libraries? Para Lib works on any JS runtime — install only what you use. Just want the syntax? Para Lang compiles to JS that imports those libraries.

$curl -fsSL https://raw.githubusercontent.com/airgap/parabun/main/install.sh | bash

Linux and macOS. Windows build is in progress. parabun self-update refreshes an existing install along with the VS Code extension.

$curl -fsSL https://raw.githubusercontent.com/airgap/parabun/main/install-extension.sh | bash

Installs the VS Code extension into any of code, cursor, or kiro found on $PATH. The extension provides the .pts / .pjs TextMate grammar and an LSP with hover, go-to-definition, purity diagnostics, memo hints, and operator documentation.

Module index

Grouped by what each module needs from the runtime. The first three groups are part of ParaBun. The fourth is bundled here for convenience and also lives as cross-runtime npm packages — see Para Lib.

GPU compute (FFI to libcuda / Metal)
AI & inference (composes the above)

Runtime modules

parabun:gpu

Metal on macOS, CUDA on Linux and Windows, CPU fallback on hosts without a GPU. A matrix passed to gpu.hold() stays resident across matVec calls, so only the input vector crosses the host↔device boundary per call. Pure Float32ArrayFloat32Array functions are runtime-compiled to PTX (via NVRTC) or MSL (via newLibraryWithSource:) when the body fits a supported shape: arithmetic, ternary, Math.*.

typescript
import gpu from "parabun:gpu";

const M = 1024, K = 768;
const mat = gpu.alloc(M * K, "f32");
for (let i = 0; i < mat.length; i++) mat[i] = Math.random();

const held = gpu.hold(mat);                          // uploaded once
const queries = [
  new Float32Array(K).fill(0.1),
  new Float32Array(K).fill(0.2),
];
for (const q of queries) {
  const scores = gpu.matVec(held, q, M, K);        // no copy
  console.log("top score:", Math.max(...scores));
}
gpu.release(held);

Beyond matVec / simdMap, parabun:gpu ships conv2D, scan, reduce, argMin / argMax, histogram, and median / quantile — CPU correctness paths today, with optional CUDA / Metal hooks on the same dispatch surface for follow-up device kernels.

parabun:image

A Sharp-class image module baked into the runtime — JPEG / PNG / WebP decode and encode (libjpeg-turbo, libpng, libwebp + libsharpyuv vendored statically), bilinear and Lanczos resize, separable Gaussian blur, unsharp-mask sharpen, Sobel edge-detect, 90 / 180 / 270 rotate, flip, crop, brightness / contrast / saturation adjust, threshold, invert, grayscale, per-channel histogram, and Porter-Duff source-over alpha compositing. No npm install sharp, no Node-ABI-versioned binary distribution.

typescript
import image from "parabun:image";

const bytes = await Bun.file("photo.jpg").bytes();
const img = image.decode(bytes);
const small = image.resize(img, { width: 800, height: 600, kernel: "lanczos" });
const sharp = image.sharpen(small, { amount: 1.5 });
const webp = image.encode(sharp, { format: "webp", quality: 85 });
await Bun.write("photo.webp", webp);

parabun:audio

A from-scratch audio toolkit: WAV / MP3 decode, Opus encode and decode (libopus 1.6.1), rnnoise-based denoiser, FFT, RBJ Audio EQ Cookbook biquads (lowpass / highpass / bandpass / notch), resample, STFT spectrogram, mel spectrogram (Whisper-mode included for STT pipelines), voice-activity detection, AGC, peak / RMS / windowed envelope, mix, normalize, interleave / deinterleave, and PCM type conversion. Heavy codecs (libopus, minimp3, rnnoise) ship statically.

OS audio I/O is wired on Linux: audio.devices() enumerates ALSA capture and playback devices, audio.capture({ device, sampleRate, channels }) returns a stream whose .frames() async-iterator yields Float32Array PCM straight from snd_pcm_readi, and audio.play({ ... }).write(samples) pushes PCM through snd_pcm_writei. The capture stream exposes reactive peakLevel and active Signals — RMS rate-limited to 10 Hz so a level meter is one effect() away. CoreAudio + WASAPI mount on the same surface in follow-ups.

typescript
import audio from "parabun:audio";
import rtp from "para:rtp";

await using mic = await audio.capture({ sampleRate: 48000, channels: 1 });
const enc = new audio.OpusEncoder({ sampleRate: 48000, channels: 1, application: "voip" });
const den = new audio.Denoiser();
const agc = new audio.Gain({ targetLevel: 0.1 });

const ssrc = 0x12345678;
let sequence = 0, timestamp = 0;

for await (const frame of mic.frames()) {
  den.process(frame.samples);                                       // suppress noise (in place)
  agc.process(frame.samples);                                       // normalize loudness
  const opus = enc.encode(frame.samples);
  const packet = rtp.pack({ payloadType: 111, sequence: sequence++, timestamp, ssrc, payload: opus });
  // send `packet` over your transport (UDP, WebRTC, …)
  timestamp += frame.samples.length;
}

parabun:camera

V4L2 capture on Linux. camera.devices() reads /sys/class/video4linux/ and runs VIDIOC_QUERYCAP on each to filter to actual capture devices. camera.formats(path) enumerates the supported (format, width, height, fps) tuples. camera.open(...) mmaps the kernel ring buffer and starts streaming, and cam.frames() is an async iterator of frames. AVFoundation (macOS) and Media Foundation (Windows) backends are planned on the same JS surface.

parabun:video

Scaffold only — the JS surface is in place (video.probe, video.decode, video.encode, video.decodeAll, with codec / container / acceleration options) but the native side hasn't been wired yet. The plan is libavcodec on desktop, V4L2 M2M on Pi 5, NVDEC/NVENC on Jetson, all behind the same JS API.

parabun:gpio

Linux gpiochip uAPI v2 over /dev/gpiochip* — the modern chardev interface, not legacy sysfs. gpio.open(chip, line, opts) returns a Line with read() / write() / async-iterable edge events. Validated end-to-end on a Raspberry Pi 5. Pairs with para:signalsline.value is a reactive Signal that updates on read and on edge events.

parabun:i2c

i2c-dev wrapper with full SMBus quick / byte / word / block transactions plus a raw readWriteRaw() for chips that need 16-bit register addressing or arbitrary frame layouts. i2c.scan(bus) probes addresses, i2c.open(bus, addr) hands back a typed device handle. Works on any Linux board with /dev/i2c-*.

parabun:spi

spidev wrapper with both half-duplex (read / write) and full-duplex (transfer) plus multi-segment (transferSegments) for mode / speed / CS-hold changes mid-burst. Per-device mode / bitsPerWord / maxSpeedHz / lsbFirst are sticky and apply to every transfer until changed.

parabun:llm

An in-tree native inference stack covering three model classes: Llama / Qwen2 chat + completion (LLM), BERT-family sentence embedders (Encoder), and Whisper STT (WhisperModel). Weights mmap off disk; residual stream and KV cache live on-device. Per-token traffic across PCIe is a 4-byte argmax. Q4_K and Q6_K matVec kernels use a 1-warp-per-row, 4-warps-per-block layout; QKV and Gate+Up projections are byte-concatenated at load time and dispatched as one matVec per layer.

typescript
import llm from "parabun:llm";

using m = await llm.LLM.load("./Llama-3.2-1B-Instruct-Q4_K_M.gguf");

for await (const piece of m.chat([
  { role: "system", content: "You are helpful and concise." },
  { role: "user", content: "What is the capital of France?" },
])) {
  process.stdout.write(piece);
}
Llama-3.2-1B Q4_K_M · RTX 4070 Ti parabun ollama
greedy decode (device-only) 340 tok/s ~350 tok/s
greedy decode (logits DtoH) 275 tok/s
prompt prefill 295 tok/s

Numbers are within run-to-run noise of ollama on this model and hardware. Chat templates for Llama-3, ChatML, and Mistral-Instruct are detected from the GGUF's tokenizer.chat_template. Only the CUDA backend is wired in this module today; Metal kernels are pending.

llm.serve({ engine, modelId, port }) exposes any model (or anything else implementing .chat() / .generate() / .embed()) over an OpenAI-compatible HTTP API. Routes: GET /v1/models, POST /v1/chat/completions (sync and SSE streaming), POST /v1/completions, POST /v1/embeddings. Optional bearer auth and a FIFO concurrency gate (default 1). Default port is 11434, matching ollama's, so OpenAI clients that auto-discover a local ollama work unchanged.

WhisperModel loads whisper.cpp ggml-*.bin files (F32 / F16 / Q4_0 / Q5_0 / Q5_1 / Q8_0) and runs encoder-decoder STT — KV cache, chunked long-audio, beam search, language detection across all 99 Whisper languages. CUDA-accelerated end-to-end (encoder im2col conv + matmuls + per-head batched attention; decoder per-token matVecs + LM head). On an RTX 4070 Ti, an 11 s JFK clip transcribes in 1.6 s with tiny.en — about 6.9× real-time.

Both LLM and WhisperModel instances expose reactive para:signals Signals: m.busy (refcounted, flips while a chat / generate / embed / transcribe call is in flight) and m.device ("cuda" | "metal" | "cpu", stable for the life of the instance). Wire a busy spinner or a backend badge with a one-liner effect().

parabun:vision

vision.frames(stream, { decodeMjpg? }) takes a frame iterator from parabun:camera (or any source yielding the same shape) and yields packed-RGBA8 frames. yuyv, nv12, and rgb24 are converted inline; mjpeg requires the caller to pass image.decode from parabun:image (cross-builtin imports between bun: modules aren't supported, so dependencies are passed in at the call site). vision.detectMotion adds a downsampled-luma frame-diff estimator with temporal smoothing.

vision.detect (YOLO / SSD / RT-DETR) and vision.recognize (Tesseract / EasyOCR) are typed but throw — both need an ONNX runtime vendored before they can do anything. The interfaces are there so callers can write against them now and have them work later.

parabun:speech

speech.listen(stream, { sampleRate }) takes an audio chunk iterator (parabun:audio's capture stream, a file reader, anything yielding { samples }) and yields one utterance per detected speech burst. The classifier is RMS-against-an-adaptive-noise-floor, with pre-roll to catch word onsets, hangover to seal on silence, and a minimum-length filter to drop clicks and breath sounds.

speech.transcribe(utt, { engine: "whisper", model }) dispatches to the WhisperModel in parabun:llm, with a per-process model cache so the weights aren't reloaded between calls. speech.speak(text, { engine: "piper", model }) drives the Piper voice synthesizer (subprocess in v1, libpiper FFI v2 tracked) and returns f32 mono PCM at the voice's native sample rate, ready to hand straight to audio.play().write(). The listen stream also exposes reactive active / noiseFloor / lastUtterance signals.

parabun:assistant

The 3-line case. Composes parabun:audio (mic + speaker), parabun:speech (VAD + STT + TTS), and parabun:llm (Llama / Qwen2 inference) into a complete on-device voice loop: await using bot = await assistant.create({ llm, stt, tts, system });, then await bot.run(). Mic captures, VAD gates, Whisper transcribes, the LLM generates, Piper synthesizes, ALSA plays — fully local, no cloud round-trip. bot.turns() exposes the loop as an async iterator; bot.ask(text) skips STT for text-only turns; bot.say(text) pushes a proactive utterance.

Reactive surface: bot.state ("idle" | "listening" | "thinking" | "speaking"), bot.history, bot.lastTurn, and bot.interrupted are all para:signals Signals — wire them straight into UI without polling. Persistent memory is one option away: pass memory: "/path/to/memory.sqlite" and the conversation transcript replays into history on every create. Power users keep their seat — bot.llm exposes the underlying model, so anything reachable directly via parabun:llm / parabun:speech / parabun:audio is reachable through bot too.

Example: LangChain VectorStore

ParaBunVectorStore extends VectorStore from @langchain/core and implements the addVectors and similaritySearchVectorWithScore methods, so call sites that accept any VectorStore work against it without changes. The shared setup below feeds both snippets:

setup (shared)
import { OpenAIEmbeddings } from "@langchain/openai";

const emb = new OpenAIEmbeddings({ modelName: "text-embedding-3-small" });
const docs = ["hello world", "good morning", "see you later"];
const vectors = await emb.embedDocuments(docs);
const q = await emb.embedQuery("greetings");
before
import { MemoryVectorStore }
  from "langchain/vectorstores/memory";

const store = new MemoryVectorStore(emb);
await store.addVectors(vectors, docs);
const hits = await store
  .similaritySearchVectorWithScore(q, 10);
after
import { ParaBunVectorStore }
  from "./parabun-store.pjs";

const store = new ParaBunVectorStore(emb);
await store.addVectors(vectors, docs);
const hits = await store
  .similaritySearchVectorWithScore(q, 10);
100k × 384 f32, top-10 add_ms score_ms vs LangChain
LangChain MemoryVectorStore 4.0 48.2 1.00×
ParaBunVectorStore 82.7 15.9 2.83×

add_ms is higher because rows are packed into a single SAB Float32Array and normalized in place — one-time O(N·D) work amortized across subsequent queries. Top-K indices and scores match LangChain's to four decimal places.

Composition examples

Where signals and module composition pay off — wiring multiple bun:* modules into one program without postMessage, child processes, or N-API bindings.

Voice → LLM → tool dispatch

Mic captures, Whisper transcribes, the LLM picks a tool under a JSON schema (mathematically guaranteed-valid output, no parse retries), the runtime dispatches it, Piper speaks the reply. The dispatch table is plain JS below; para:mcp (now shipped — stdio + WebSocket transports) lets you swap the table for a Model Context Protocol client without changing this control flow. effect() over mic.peakLevel / chat.busy / wsp.busy drives a status line — no polling, no observers, no event emitters.

import assistant from "parabun:assistant";
import { effect } from "para:signals";

await using bot = await assistant.create({
  llm: "./Llama-3.2-1B-Instruct-Q4_K_M.gguf",
  stt: "./ggml-tiny.en.bin",
  tts: "./en_US-lessac-medium.onnx",
  system: "You are a helpful home assistant. Keep replies short.",
  tools: {
    setLight:  ({ room, on, brightness }) => console.log(`light ${room} ${on ? "on" : "off"} @ ${brightness}`),
    playMusic: ({ track }) => console.log(`play ${track}`),
  },
});

// One reactive line redraws a live status line in place.
effect(() => process.stdout.write(`\r${bot.state.get()}`));

await bot.run();   // VAD → STT → LLM (with grammar-constrained tool calls) → TTS → speaker

parabun:assistant composes parabun:audio + parabun:speech + parabun:llm + para:signals internally. bot.state is a Signal that cycles through "idle" | "listening" | "thinking" | "speaking"; the tools dispatch table works the same as an para:mcp client. What's underneath shows the same loop hand-rolled — same five modules, no facade.

Webcam motion → reactive assistant

Module composition: parabun:camera + parabun:vision + parabun:assistant over para:signals. vision.detectMotion emits frame-by-frame scores; the rising-edge handler fires once on each false→true transition of the predicate — no derived() wrapper, no state machine, no debounce timer, no wasPresent flag. The ParaBun tab uses a when EXPR { … } block; the TypeScript tab uses the equivalent signals.when() call.

import camera    from "parabun:camera";
import vision    from "parabun:vision";
import assistant from "parabun:assistant";
import signals   from "para:signals";

await using cam = await camera.open("/dev/video0", { format: "yuyv", width: 640, height: 480 });
await using bot = await assistant.create({
  llm: "/models/Llama-3.2-1B-Instruct-Q4_K_M.gguf",
  tts: "/models/en_US-lessac-medium.onnx",
  system: "You are a friendly home assistant. Keep replies short.",
});

// motion.score / motion.detected auto-fill as frames flow.
const motion = vision.detectMotion(vision.frames(cam.frames()), { sensitivity: 0.04 }).run();

// Greet whenever motion appears AND the bot is idle. The predicate is
// auto-tracked — every signal it reads becomes a dep, no derived() wrapper.
signals.when(
  () => motion.detected.get() && bot.state.get() === "idle",
  () => bot.say("Welcome back!"),
);

Four modules, three signals, zero glue code. Barge-in is built into parabun:assistant now (rising edge on vad.active drops the queued TTS via spk.stop() and stamps turn.interrupted); programmatic cancel is bot.interrupt().

What's underneath

parabun:assistant isn't magic — it stitches the same five modules a user could call directly. The version below is the loop the home-assistant facade performs for you. Useful when you want a non-default flow (custom VAD threshold, separate transcribe + chat sessions, your own JSON dispatch). Routine smart-home or IoT cases should still pick the facade; this one's longer.

import audio  from "parabun:audio";
import speech from "parabun:speech";
import llm    from "parabun:llm";
import { effect } from "para:signals";

const tools: Record<string, (args: any) => any> = {
  setLight:  ({ room, on, brightness }) => console.log(`light ${room} ${on ? "on" : "off"} @ ${brightness}`),
  playMusic: ({ track }) => console.log(`play ${track}`),
  reply:     ({ text }) => text,
};
const ToolSchema = { /* JSON schema with oneOf for each tool */ };

await using mic  = await audio.capture({ sampleRate: 16000, channels: 1 });
using       wsp  = await llm.WhisperModel.load("./ggml-tiny.en.bin");
using       chat = await llm.LLM.load("./Llama-3.2-1B-Instruct-Q4_K_M.gguf");

effect(() => {
  process.stdout.write(`\rmic ${mic.peakLevel.get().toFixed(3)}  llm ${chat.busy.get() ? "🤔" : ""}  whisper ${wsp.busy.get() ? "🎙️" : ""}`);
});

for await (const utt of speech.listen(mic.frames(), { sampleRate: 16000 })) {
  const heard = wsp.transcribe(utt.samples, { language: "en" });
  const { tool, args } = await chat.chatJSON([{ role: "user", content: heard }], { schema: ToolSchema, maxTokens: 80 });
  const result = await tools[tool](args);
  if (typeof result === "string") {
    await speech.say(result, { engine: "piper", model: "./en_US-lessac-medium.onnx" });
  }
}

chat.chatJSON({ schema }) drains the streamed grammar-constrained chat and parses the result in one call; speech.say(text) wraps speak() + audio.play() + spk.write() with a process-wide cached PlaybackStream keyed on (sampleRate, channels). Both ergonomic shortcuts shipped alongside the assistant facade.

Roadmap

Each module is gated by a compile-time feature flag. The configurator generates a bun build --compile invocation with only the flags you select.

Status Module What it does
shipped parabun:image JPEG / PNG / WebP decode + encode, resize (bilinear / Lanczos), blur / sharpen / edge-detect, rotate / flip / crop, adjust / threshold / invert / grayscale, histogram, alpha composite.
shipped parabun:audio WAV / MP3 / Opus codecs, RBJ biquads, FFT, resample, spectrogram, VAD, denoiser (rnnoise), AGC, mix / normalize / envelope, planar ⇄ frame-major + i16 ⇄ f32 PCM helpers.
shipped parabun:gpu primitives conv2D, scan, reduce, argMin / argMax, histogram, median / quantile. CPU correctness paths today; CUDA / Metal hooks slot in via the existing dispatch.
shipped parabun:camera V4L2 capture on Linux — devices(), formats(path), open(...) with an async-iterator frames() over kernel-mmapped buffers. AVFoundation + Media Foundation follow on the same surface.
shipped OS audio I/O Live ALSA capture + playback for parabun:audio. devices() / capture(...) / play(...) with Float32 PCM streams, S16_LE on the wire. CoreAudio + WASAPI follow.
shipped parabun:gpio / parabun:i2c / parabun:spi Linux peripheral I/O — uAPI v2 GPIO chardev (with edge events), i2c-dev SMBus, spidev half + full duplex with multi-segment transfers. Validated end-to-end on Raspberry Pi 5.
partial parabun:gpu device kernels CUDA reduce (sum / min / max) + atomic-privatized histogram shipped. Scan, Metal mirror, and the rest of the secondary primitives still on CPU until wired.
partial parabun:vision Frame stream + frame-diff motion detection ship today (vision.frames, vision.detectMotion). Detector (detect) and OCR (recognize) engines stub until ONNX runtime is vendored.
shipped parabun:speech VAD-gated speech.listen (with reactive active / noiseFloor / lastUtterance signals), Whisper STT (speech.transcribe, dispatching to parabun:llm's WhisperModel — encoder-decoder, KV cache, beam search, language detection, CUDA-accelerated end-to-end), and Piper TTS (speech.speak — subprocess in v1; libpiper FFI v2 tracked).
shipped parabun:assistant Three-line voice-assistant facade composing parabun:audio + parabun:speech + parabun:llm + para:mcp. bot.run / turns / ask / say / interrupt + reactive state / history / lastTurn / interrupted / toolsActive signals + sqlite-backed persistent memory + tool dispatch (inline + MCP) + VAD-driven barge-in + wake word (wakeWord: "hey jetson") + cron-driven scheduled prompts + RAG (knowledge: { dir, encoder }). Vision (VLM) turns deferred to follow-up.
in progress parabun:video JS surface scaffolded; libavcodec / V4L2 M2M / NVDEC native binding lands with hardware bring-up. Decode + encode + container muxing.
planned parabun:image AVIF AVIF decode/encode (libavif + AOM / dav1d vendor add). Rounds out the codec coverage matrix.
HTTP handlers, JSON parsing, and ordinary application code use the same code paths as upstream Bun. The added modules don't change behavior or performance there.