A stream in Node.js is an interface for working with streaming data. Many developers find streams confusing the first time they meet them. The mental model is unusual. Instead of waiting for a whole file or HTTP response, you process bytes as they arrive. This article walks through the four stream types, when to use each one, how backpressure works, and a complete example that filters and transforms a large JSON dataset.
What is a Stream?
The Node.js documentation defines a stream as an abstract interface for working with streaming data. Streams let you read or write data in chunks rather than loading everything into memory at once. Two practical wins follow from this. Memory stays flat regardless of input size. Processing also starts before the source has finished producing data.
For the full API surface, see the official Node.js stream documentation.
The Four Stream Types
Node.js ships four stream types. Knowing which one you are working with is half the battle.
- Readable: sources you read from. Examples include
fs.createReadStream(), an HTTP request body on the server, andprocess.stdin. - Writable: destinations you write to. Examples include
fs.createWriteStream(), the HTTP response object, andprocess.stdout. - Duplex: both readable and writable on independent channels. A TCP socket (
net.Socket) is the canonical example. - Transform: a duplex stream whose output is derived from its input.
zlib.createGzip()andcrypto.createCipheriv()are transforms.
A readable stream operates in one of two modes. In flowing mode, data is pushed to listeners through the 'data' event. In paused mode, you pull chunks with read(). Most modern code avoids manual mode switching and relies on pipe() or pipeline() instead.
When to Reach for Streams
Streams pay off when any of the following is true:
- The input is large enough that buffering it would strain memory.
- The downstream consumer can start work before the upstream producer is done.
- The data arrives over time, not all at once.
Common cases include reading a multi-gigabyte log file, serving a video to a client, gzipping an HTTP response, piping a database cursor into a CSV writer, and parsing a CSV upload row by row.
A Worked Example: Filter and Transform
The next sections build a small pipeline that does four things:
- Reads a large JSON-lines file.
- Filters records where
home_ownershipequalsRENT. - Adds a
risk_profilefield computed fromloan_amount. - Writes the result to a new file.
The dataset is the Loan Data for Dummy Bank file. I converted it to newline-delimited JSON so each chunk represents one record.
Start with the file streams:
const fs = require('fs');
const path = require('path');
const readStream = fs.createReadStream(path.resolve(READ_FILE_PATH));
const writeStream = fs.createWriteStream(path.resolve(WRITE_FILE_PATH));
Build the filter as a Transform stream. The condition keeps only rented homes:
const { Transform } = require('stream');
const filterCondition = (elem) => elem && elem.home_ownership === 'RENT';
const filterData = (fn, options = {}) =>
new Transform({
objectMode: true,
...options,
transform(chunk, encoding, callback) {
let obj;
try {
obj = JSON.parse(chunk.toString());
} catch (e) {
return callback();
}
const keep = fn(obj);
callback(null, keep ? chunk : undefined);
},
});
A second Transform attaches the risk profile:
const riskProfile = (elem) => {
elem.risk_profile = elem.loan_amount > 5000 ? 'high' : 'low';
return JSON.stringify(elem);
};
const transformData = (fn, options = {}) =>
new Transform({
objectMode: true,
...options,
transform(chunk, encoding, callback) {
try {
const out = fn(JSON.parse(chunk));
callback(null, `${out}\n`);
} catch (e) {
callback(e);
}
},
});
Hook everything together with pipeline(). The split module breaks the file into lines so each chunk arrives as one JSON record:
const { pipeline } = require('stream/promises');
const split = require('split');
async function run() {
try {
await pipeline(
readStream,
split(),
filterData(filterCondition),
transformData(riskProfile),
writeStream
);
console.log('Pipeline succeeded.');
} catch (err) {
console.error('Pipeline failed.', err);
}
}
run();
The promise-based pipeline from stream/promises was added in Node.js 15. It forwards errors, destroys downstream streams on failure, and returns a promise you can await. The older callback form still works but reads poorly in larger codebases.
Backpressure: The One Thing Beginners Miss
Backpressure is what happens when a writable cannot keep up with a readable. Without handling it, memory grows until the process runs out of heap. When writable.write() returns false, the internal buffer is full and the producer should pause. When the writable emits 'drain', the producer can resume.
Both pipe() and pipeline() handle this for you. Hand-rolled stream code is where backpressure bugs live. For a deeper dive, see backpressure optimization in streams.
Error Handling
Three patterns are worth knowing.
First, an 'error' event on one stream in a .pipe() chain does not propagate. An error on stream A will not destroy stream B. Memory and file descriptors leak.
Second, stream.pipeline() fixes this by destroying every stream in the chain when one of them errors. This is the main reason to prefer pipeline() over pipe().
Third, always wrap JSON.parse or user code inside the transform function in a try/catch. Throwing inside the callback crashes the process. Calling callback(err) ends the stream cleanly and surfaces the error through pipeline().
Streams vs Buffers
A buffer holds the entire payload in memory before you touch it. A stream lets you touch each chunk as it arrives. For a 10 MB file, the difference is invisible. For a 10 GB file or a never-ending socket, only streams work.
Common Use Cases
Streams are the standard tool for:
- File processing: reading, writing, and transforming files of any size.
- HTTP request and response bodies: both
reqandresare streams. - Compression and decompression with
zlib. - Encryption and decryption with
crypto. - Streaming database results, especially with MongoDB cursors or PostgreSQL
COPY. - Audio and video delivery, including HLS and DASH segments.
The complete code for the loan-data example is on GitHub.