Mendel: A Self-Hosted A/B Testing Framework

Mendel is a self-hosted A/B testing and feature flag library for Node.js apps that already use Mongoose. v1.0.2 shipped this week on npm under MIT.

Introduction

An early-stage startup always faces the dilema and constraints when the pace of feature development and experimentation around the features both needs to be fast. The dilema is whether ot include another SaaS item for experimentation inccuring additional cost or invest engineering bandwidth and develop an inhouse experimentation framwerok. This is the time, I started writing Mendel.

The same wall kept showing up on several Node.js projects:

Procurement did not want another vendor on the invoice.
Security did not want pseudonymous IDs leaving the VPC.
Latency-sensitive code paths could not afford a round-trip to evaluate a flag.
We already had MongoDB. We did not need another datastore.

Self-hosted GrowthBook is Postgres-first and broader in scope than what most teams need on day one. Unleash and Flagsmith are sized for organizations larger than the ones I usually work with. I wanted something narrower: a small, embeddable A/B testing and feature flag library for Node.js apps that already use Mongoose, with a usable admin UI, and no SaaS dependency.

What Is Included

Two rollout modes. Probabilistic A/B testing with weighted variants, and explicit feature flag enrollment with per-item allowlists.
Deterministic bucketing. The same (experiment, user) pair always resolves to the same variant across servers, regions, and client SDKs.
Targeting rules with operators eq, in, gt(e), contains, regex, combined with all or any semantics.
Layers and holdouts for running multiple experiments simultaneously without contaminating each other.
Prerequisites so one experiment can gate on another’s variant assignment.
Force-assign overrides for QA, demos, and customer escalations.
Variant payloads. Ship arbitrary JSON (copy, config, feature toggles) with each variant.
Audit and exposure hooks to stream events to your existing analytics pipeline.
A React admin UI so non-engineers can manage experiments without filing tickets.
Express integration with optional celebrate validation.
Docker Compose. docker compose up runs the whole stack.

All of it lives inside your VPC, in your existing Mongo cluster, behind your existing auth.

Deterministic Bucketing

This is the technical decision that ripples through everything else.

Naive variant assignment looks like this:

const variant = Math.random() < 0.5 ? 'control' : 'treatment';

It works exactly once. The same user hitting your service again gets a fresh roll. Persisting the assignment to a database means a hot-path Mongo read on every request, plus a synchronization problem between your servers, client SDKs, and edge cache.

The fix is well known. Hash the user ID into a uniform [0, 1) value, then partition that range by variant weights.

function fnv1a(input) {
  let hash = 2166136261;
  for (let i = 0; i < input.length; i++) {
    hash ^= input.charCodeAt(i);
    hash = Math.imul(hash, 16777619);
  }
  return hash >>> 0; // 32-bit unsigned
}

function bucket(salt, itemId) {
  return fnv1a(`${salt}:${itemId}`) / 2 ** 32; // ∈ [0, 1)
}

That single function gives you three properties:

Statelessness. No database read to decide which variant a user is in.
Determinism. The same user lands in the same bucket across every process that runs the same hash.
Portability. Implement FNV-1a in any language and a mobile or edge client computes the same answer as the server, with no round-trip.

The salt defaults to the experiment name. You can override it so that two experiments with overlapping audiences do not share bucket space. Otherwise the same lucky users would always land in the treatment arm of every test, which is statistically nasty.

Three Buckets Per Evaluation

Mendel computes three independent buckets:

${salt}:rollout decides if the user is eligible for the experiment at all. You can run an experiment at 10% rollout while still doing a fair 50/50 split within that 10%.
${salt}:${itemId} decides which variant, given eligibility.
${layerSalt}:holdout decides if the user is in the layer’s global holdout.

Keeping these independent matters. If eligibility and variant assignment shared a hash, your 50/50 treatment group would no longer be 50/50 once you filtered to the 10% rollout. Subtle bug, very real consequences in your metrics.

Layers and Holdouts

When you run your second experiment in production, you discover that experiments interfere with each other. Two simultaneous tests on the same checkout flow contaminate each other’s metrics. You can no longer tell which variant of which experiment caused the lift.

The standard fix is layers. Group mutually exclusive experiments so a user lands in at most one of them. Mendel models this directly. A layer is a named bucket of experiments that share an audience namespace, plus an optional holdout. The holdout is a slice of users who see no experiment in the layer at all, which lets you measure the cumulative lift from everything you are running.

await service.createLayer({
  layer_name: 'checkout_layer',
  holdout_pct: 10,
}, { id: 'admin' });

await service.assignToLayer(layerId, [
  exp_new_checkout_id,
  exp_new_billing_id,
], { id: 'admin' });

Now exp_new_checkout and exp_new_billing never overlap on the same user, and 10% of traffic in the layer is held out as a control.

Quickstart

Install the package alongside Mongoose:

npm install mendel-framework mongoose

Create the framework, define an experiment, and evaluate it:

const mongoose = require('mongoose');
const { v4: uuid } = require('uuid');
const {
  createMendelFramework,
  ROLL_OUT_TYPE,
  TARGETING_OP,
} = require('mendel-framework');

await mongoose.connect(process.env.MONGO_URI);

const { service, manager } = createMendelFramework(mongoose, {
  generateId: uuid,
  environment: 'prod',
  cache: { enabled: true, ttlMs: 5000, max: 1000 },
  onExposure: (e) => analytics.track(e.exp_name, { variant: e.variant_key }),
});

await service.createExperiment({
  exp_name: 'exp_new_checkout',
  exp_type: 'flag',
  roll_out_type: ROLL_OUT_TYPE.A_B_TESTING,
  roll_out_value: 80,
  variants: [
    { key: 'control',   weight: 50, payload: { ui: 'classic' } },
    { key: 'treatment', weight: 50, payload: { ui: 'streamlined' } },
  ],
  targeting: {
    match: 'all',
    rules: [{ attribute: 'plan', op: TARGETING_OP.IN, values: ['pro', 'enterprise'] }],
  },
  start_date: Date.now(),
  end_date: Date.now() + 30 * 24 * 60 * 60 * 1000,
}, { id: 'admin' });

const { variant, payload } = await service.evaluate(
  'exp_new_checkout',
  'USER_42',
  { plan: 'enterprise', country: 'US' },
);
// → { variant: 'treatment', payload: { ui: 'streamlined' }, reason: 'bucketed', ... }

To skip the manual setup, run docker compose up --build. The backend, admin UI, and Mongo come up on localhost:3100.

What Mendel Does Not Do

No Python, Go, or Java SDKs yet. Server-side evaluation is Node.js only. The hashing scheme is documented and trivial to port, but those SDKs are not written.
No managed cloud offering. If you do not want to run infrastructure, GrowthBook Cloud or LaunchDarkly are better fits.
No built-in statistical analysis engine. Mendel emits exposure events; you analyze them in your warehouse. For built-in stats, GrowthBook is more complete.
Not battle-tested at scale yet. v1.0.2 is fresh. The bucketing is provably uniform and the hot path is cached, but it has not run under sustained production load.

Mendel: A Self-Hosted A/B Testing Framework

Introduction

What Is Included

Deterministic Bucketing

Three Buckets Per Evaluation

Layers and Holdouts

Quickstart

What Mendel Does Not Do

Links

Structured Logging in Nodejs: A Complete Guide to Pino

Mongoose vs MongoDB Driver: When to Use What

Mendel: A Self-Hosted A/B Testing Framework

Introduction

What Is Included

Deterministic Bucketing

Three Buckets Per Evaluation

Layers and Holdouts

Quickstart

What Mendel Does Not Do

Links

Structured Logging in Nodejs: A Complete Guide to Pino

Mongoose vs MongoDB Driver: When to Use What

MongoDB Backup and Recovery: Disaster Prevention Strategies

Supercharging JavaScript: V8 JIT Optimization Techniques

Is Currying in JavaScript Just A Chain of Functions?

Avoid JavaScript Template Literals for Building URLs with Query Params

Async Operations with AbortController & AbortSignal in Nodejs

Node.js Memory Management and Debugging

Stay Updated