Blog

Notes on measurable code improvement.

Product updates, engineering notes, and practical guides for running AI-driven experiments on real codebases.

Introducing GradeX v0.1.0

GradeX is an open-source CLI for autonomous code optimization. It runs parallel AI agents in isolated git worktrees, scores every patch against a benchmark, applies correctness gates, and keeps only changes that prove measurable improvement.

Why GradeX exists

Most code changes are evaluated by intuition. A developer believes a refactor is faster, a model believes a patch is cleaner, or a team ships an optimization because it looks plausible. GradeX changes the loop: every candidate patch needs a number, a baseline, and a gate.

What v0.1.0 includes

  • CLI commands for install, discover, optimize, dashboard, stats, history, doctor, and upgrade.
  • Git worktree isolation so experiments never touch the main branch.
  • Benchmark and gate runners that score each patch and reject regressions.
  • AI provider support for Groq, Anthropic, OpenAI, and Ollama.
  • A live dashboard with score charts, logs, pass rate, and experiment breakdowns.
  • Production hardening: secret scrubbing, path safety, benchmark caching, rate limiting, and shutdown cleanup.

How it works

Run gradex discover to identify a target and capture a baseline. Then run gradex optimize. GradeX spawns subagents, gives each a brief, evaluates their patches, and updates the dashboard as experiments complete.

Roadmap

Next we are focusing on better multi-language discovery, hosted team dashboards, richer telemetry, and deeper integrations with coding agents.

Try GradeX free →

How to design a benchmark that agents cannot game

A practical guide to deterministic benchmarks, warmups, gate selection, and avoiding false wins.

Read the docs →

Why every AI code agent needs an experiment loop

Parallel exploration only matters when each change is measured, gated, and recorded.

See the product →