Autonomous AI Research

HomeAutoresearch

Autoresearch Ecosystem Index

A curated index of autonomous research loops, tools, and benchmarks built on the keep-or-revert primitive introduced by Andrej Karpathy.

Explore 82+ Adaptations

webfuse-com/awesome-autoresearch

82+Forks & Adaptations

6Ecosystem Categories

100%Unattended Runs

Cycle38READING

01. READ_STATE

Read StatenanoGPT/train.py

Loss ledger loaded

02. PROPOSE

Propose ChangeBatch size: 64

Patch generated

03. SANDBOX

Run Experimenttrain.py --device=cuda

GPU 94% | 82°C

04. MEASURE

Measure Metricval_bpb: 1.556

Diff: -0.028

05. LOG_OUTCOME

Log Outcomeresults.tsv

Ledger synced

06. KEEP_REVERT

Keep / RevertVerdict: KEEP

Git committed

Reference guide

Loop Reference & Community Outcomes

Quick reference on what autoresearch is, how the loop operates, and documented community results.

Core Concepts

Background

Introduced by Andrej Karpathy as a natural language instruction document (program.md). A coding agent reads it, proposes one change, runs a time-bounded training session, and evaluates the validation bits-per-byte.

The Keep-or-Revert Primitive

Each cycle modifies exactly one file. If the measured metric improves, the change is committed. If not, the file is restored via Git hard reset. This reversible constraint prevents regressions from compounding over hundreds of cycles.

Applicability

Transfers to any domain with a measurable scalar fitness function. Used in: ML training loss, GPU kernel MFU, software build times, trading strategy Sharpe ratios, ancient document ink detection, and static analysis metrics.

Loop Steps

Read StateReads baseline code and previous logs.

Select ChangeProposes code edits based on history.

Edit TargetApplies exactly one change to target file.

Run ExperimentRuns codebase under a sandbox time budget.

Read MetricParses evaluation output (e.g. val_bpb).

Keep/RevertCommits on metric success; else reverts.

Log OutcomeAppends data to the results ledger.

Community Results

37c

NanoGPT Training

Karpathy's overnight run completed 37 validation cycles. Source

65%

Shopify CI Builds

David Cortés achieved 65% faster CI builds; Tobi Lütke contributed, leading to pi-autoresearch. Source

Pitch Prediction

Driveline optimized XGBoost models predicting pitch velocity from sensor data.

Ancient Ink Detection

Self-supervised multi-agent loops optimized scroll ink generalisation. Source

Case studies

Ecosystem Case Studies & Writeups

Deep dives, optimization reports, and technical guides from teams running the keep-or-revert loop in production.

Case StudyShopify Engineering

Shopify CI Build Optimisation

David Cortés adapted autoresearch to optimise CI build times, achieving 65% faster builds. Tobi Lütke contributed multi-metric support and auto-commits, leading to the open-sourced pi-autoresearch (3,600+ stars).

Case StudyNick Oak

Tennis XGBoost + Reward Hacking

Autoresearch-inspired loop for tennis match prediction — and an honest account of where the optimisation setup went wrong (reward hacking).

Case StudyScroll Prize

Vesuvius Challenge Ink Detection Swarm

Multi-agent experimental loop applied to ancient-scroll ink detection, with a writeup on cross-scroll generalisation improvements.

Case StudyPara Giri

Earth System Model Optimisation

Hybrid workflow where an LLM proposes equation structures and a search process tunes parameters, extending autoresearch into scientific modelling.

Paperarxiv.org

The Agentic Researcher

A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning. Cites autoresearch as the canonical example of automated ML experiment pipelines.

Case StudySkyPilot

Scaling Autoresearch to GPU Clusters

Running autoresearch on H100/H200 clusters with cloud orchestration. Covers distributed experiment management and cost control.

Case StudyAddy Osmani

Self-Improving Coding Agents

Practical guide to setting up self-improving agent loops with Claude Code. Covers the key primitives and common failure modes.

Case StudyEnsue Dev

autoresearch@home: Distributed AI Research

The SETI@home model applied to autoresearch — contribute GPU time to collective model optimisation.

Case StudyMindStudio

Claude Code + AutoResearch for Self-Improving Skills

Building self-improving AI skills using Claude Code with autoresearch patterns. Step-by-step implementation guide.

Case StudyParticula

100 ML Experiments Overnight

Technical breakdown of the autoresearch loop with domain-agnostic fork applications and reproducible results.

Case StudyAakash Gupta

PM's Guide to Autoresearch

Product manager's guide covering setup, community forks, and real-world applications of the autoresearch loop.

Case StudySid Saladi

Autoresearch 101 Builder's Playbook

Deep-dive on applying autoresearch patterns to prompts, agents, and workflows with concrete examples.

Case StudyFortune

Fortune Feature

Business and industry context on why autoresearch matters for the future of autonomous AI agents.

Ecosystem Index 82 implementations

Forks, adaptations, writeups, and benchmarks

Contribute via PR

Fetching star counts...

kayba-ai

recursive-improve

Recursive self-improvement framework where agents capture execution traces, analyse failure patterns, and apply targeted fixes with keep-or-revert evaluation.

Autoresearch Ecosystem Index

Loop Reference & Community Outcomes

Core Concepts

Loop Steps

Community Results

Ecosystem Case Studies & Writeups

Shopify CI Build Optimisation

Tennis XGBoost + Reward Hacking

Vesuvius Challenge Ink Detection Swarm

Earth System Model Optimisation

The Agentic Researcher

Scaling Autoresearch to GPU Clusters

Self-Improving Coding Agents

autoresearch@home: Distributed AI Research

Claude Code + AutoResearch for Self-Improving Skills

100 ML Experiments Overnight

PM's Guide to Autoresearch

Autoresearch 101 Builder's Playbook

Fortune Feature

Forks, adaptations, writeups, and benchmarks

recursive-improve

auto-research

autoresearch

codex-autoresearch

Thoth

gemini-autoresearch

pi-autoresearch

autoresearch-claude-code

autocontext

ax

goal-md

lazy-developer

autoresearch-at-home

autoresearch-anything

autoresearch-everywhere

ADAS

self_improving_coding_agent

self-improving-agent

HGM

gepa

EvoSkill

autoevolve

ClawTeam

AI-Research-SKILLs

aideml

weco.ai

AutoResearchClaw

NanoResearch

ARK

Auto-claude-code-research-in-sleep

AutoSci

AutoResearch-SibylSystem

autoresearcher

agi

CORAL

AI-Scientist

AI-Scientist-v2

AiScientist

AI-Researcher

Auto-Research

AgentLaboratory

agentrxiv.github.io

ResearchAgent

MLR-Copilot

ML-Agent

LatteReview

LitLLM

agentlaboratory.github.io

openclaw-autoresearch

autoresearch-macos

autoresearch-mlx

autoresearch-win-rtx

n-autoresearch

autoresearch-webgpu

autoresearch-engram

karpathy/autoresearch#208

autoautoresearch

autoresearch-genealogy

autovoiceevals

atlas-gic