Skip to content

para:csv

import csv from "para:csv";

A single export — parseCsv(input, opts?) — that returns an async iterable of rows. The parser is a state machine over UTF-8 bytes; it never materializes the full file in memory regardless of size.

input can be:

  • Bun.BunFile (recommended for files on disk).
  • ReadableStream<Uint8Array> or AsyncIterable<Uint8Array> (for fetched content, pipes, sockets).
  • Uint8Array or string (for in-memory).
import csv from "para:csv";
for await (const row of csv.parseCsv(Bun.file("data.csv"), { header: true })) {
process(row.id, row.name, row.score);
}
OptionDefaultDescription
headerfalseWhen true, the first row is the column names; subsequent rows are emitted as objects keyed by column. When false, rows are string[].
delimiter","Single-character cell separator.
quote"\""Single-character quote that wraps cells with embedded delimiters / newlines.
escapesame as quoteRFC 4180 doubles the quote ("") to escape; some dialects use \\".
commentnoneIf set, lines starting with this character are skipped.
inferTypestrue (with header)Per-cell type inference: numeric → number, true / falseboolean, empty / nullnull. Plain strings pass through.
parallelfalseSee below.

Without header, every row is an array of strings (no inference — keeps fast-path simple).

parallel: true chunks the input across para:parallel’s worker pool when the input has no quoted cells (the byte-boundary heuristic doesn’t work otherwise). It runs the parse off the main thread.

for await (const row of csv.parseCsv(Bun.file("data.csv"), { header: true, parallel: true })) {
// row processed off main thread
}

This is not a per-file speedup. The serial state machine is already memory-bandwidth-bound, and the parallel path’s materialize-and-fork overhead grows with input size. Sweep on a 16-core x86 release build:

FixtureSerial (med)Parallel (med)Speedup
5 MB · 128k rows152 ms129 ms1.18×
50 MB · 1.25M rows1446 ms1528 ms0.95×
200 MB · 4.92M rows5892 ms6363 ms0.93×

Use parallel: true to keep the event loop responsive while parsing (parsing N files concurrently does scale across cores), not because you expect bigger files to go faster. bench/parabun-csv-parallel/ reproduces the numbers.

para:csv rows pair naturally with para:arrow’s fromRows:

import csv from "para:csv";
import arrow from "para:arrow";
const rows: any[] = [];
for await (const row of csv.parseCsv(Bun.file("data.csv"), { header: true })) rows.push(row);
const tbl = arrow.fromRows(rows);
arrow.mean(tbl.column("score"));

For very large CSVs, batch the bridge — call arrow.fromRows per N rows instead of materializing them all first.

  • Multi-byte delimiters / quotes aren’t supported. RFC 4180 specifies single-byte for both.
  • Parallel mode requires the input has no quoted cells (otherwise byte-boundary chunking can split a quoted region).
  • Type inference is per-cell — there’s no whole-column type promotion. If column score has mostly numbers and one "N/A", you get a mix of number and string; coerce on your end if that’s a problem.