This series walks through the full experimentation cycle on a real open-source AI project. You’ll connect a codebase, set up the shared context that grounds future work, scope a new experiment from a recommendation, implement it, evaluate it against a pre-committed standard, and track the result back into your project’s history. By the end, you’ll have run a closed-loop experiment in Remyx and have a working pattern you can repeat on your own repos.Documentation Index
Fetch the complete documentation index at: https://docs.remyx.ai/llms.txt
Use this file to discover all available pages before exploring further.
What you’ll learn
The series has seven tutorials grouped into five phases. Each tutorial is short and focused on a single user-facing question.| Phase | Tutorial | Time | Outcome |
|---|---|---|---|
| Setup | Create your project | ~3 min | A Remyx project linked to your repo, with experiment history extracted from merged PRs |
| Setup | Set up your discovery feed | ~3 min | A daily discovery digest ranked against your team’s actual work |
| Setup | Define how progress gets measured | ~10 min | A locked eval template and decision policy. Your team’s bar for shipping is written down before any variant runs |
| Discovery | Scope an experiment from a recommendation | ~5 min | A scoped experiment with hypothesis, target metric, and tags, ready to be implemented |
| Build | Implement an Experiment | ~10 min | A PR that implements the technique, generated by Claude Code reading your project context via MCP |
| Decide | Run an evaluation | ~15 min | A variant scored against the locked eval template on Modal, with the decision logged |
| Reflect | Stay in the loop | ~5 min | A read on the project’s velocity, the signals emerging across directions, the rationale behind every decision, and the lineage between related experiments |
The example used throughout
The series follows along on remyxai/VQASynth, an open-source synthetic-data pipeline for spatial-reasoning vision-language models. It’s a public repo you can clone, fork, or just read alongside the tutorials. You don’t need to use VQASynth. It’s a stand-in for whatever AI repo you actually care about. Wherever you see VQASynth mentioned, you can substitute your own repo URL. The flow works the same way on a retrieval stack, an LLM application, a multi-stage agent, a model-training pipeline, or anything in between. Using a single example throughout means each tutorial’s artifact carries forward to the next without re-establishing context.Concepts you’ll meet along the way
A few terms recur across the series. Here’s what they mean upfront:- Project. A workspace that scopes a set of related experiments. Carries shared context (history, eval template, decision policy, integrations) so individual experiments don’t have to re-establish it.
- Experiment. A single tracked change with a hypothesis, a target metric, and an outcome. Can be backfilled (auto-extracted from a merged PR) or created manually.
- Eval template. A saved configuration that says how to score a variant against a baseline.
- Decision policy. A saved set of rules that says when a variant counts as shipping, when it counts as rejecting, and when the team should iterate.
- Variant. A candidate alternative to the baseline that gets evaluated.
- Baseline. Whatever the variant gets compared against. The form depends on what your project produces (a model checkpoint, a dataset, a system version).
Prerequisites
Required for the full series.
- A Remyx account
- A connected GitHub integration (see Connectors)
- Claude Code installed (for the implementation tutorial)
- A Modal account, with billing enabled
- A Hugging Face token, set as a project secret
Start the series
Create your project
Connect a repo and watch Remyx extract structured experiment history from your merge log. About three minutes.
Or skip ahead
Set up your discovery feed
Daily discovery digests ranked against your team’s work
Define how progress gets measured
Lock in your eval template and decision policy
Scope an experiment from a recommendation
Turn a paper into a structured experiment
Implement an Experiment
Generate a PR via the MCP integration
Run an evaluation
Score the variant on Modal, log the decision
Stay in the loop
Velocity, signal patterns, decision rationale, and lineage views