Build Your First Project

This series walks through the full experimentation cycle on a real open-source AI project. You’ll connect a codebase, set up the shared context that grounds future work, scope a new experiment from a recommendation, implement it, evaluate it against a pre-committed standard, and track the result back into your project’s history. By the end, you’ll have run a closed-loop experiment in Remyx and have a working pattern you can repeat on your own repos.

What you’ll learn

The series has seven tutorials grouped into five phases. Each tutorial is short and focused on a single user-facing question.

Phase	Tutorial	Time	Outcome
Setup	Create your project	~3 min	A Remyx project linked to your repo, with experiment history extracted from merged PRs
Setup	Set up your discovery feed	~3 min	A daily discovery digest ranked against your team’s actual work
Setup	Define how progress gets measured	~10 min	A locked eval template and decision policy. Your team’s bar for shipping is written down before any variant runs
Discovery	Scope an experiment from a recommendation	~5 min	A scoped experiment with hypothesis, target metric, and tags, ready to be implemented
Build	Implement an Experiment	~10 min	A PR that implements the technique, generated by Claude Code reading your project context via MCP
Decide	Run an evaluation	~15 min	A variant scored against the locked eval template on Modal, with the decision logged
Reflect	Stay in the loop	~5 min	A read on the project’s velocity, the signals emerging across directions, the rationale behind every decision, and the lineage between related experiments

Each tutorial is self-contained, but they’re designed to compose. The artifact you produce in one tutorial is the input to the next. Following the whole series end-to-end leaves you with a real experiment closed in Remyx.

The example used throughout

The series follows along on remyxai/VQASynth, an open-source synthetic-data pipeline for spatial-reasoning vision-language models. It’s a public repo you can clone, fork, or just read alongside the tutorials. You don’t need to use VQASynth. It’s a stand-in for whatever AI repo you actually care about. Wherever you see VQASynth mentioned, you can substitute your own repo URL. The flow works the same way on a retrieval stack, an LLM application, a multi-stage agent, a model-training pipeline, or anything in between. Using a single example throughout means each tutorial’s artifact carries forward to the next without re-establishing context.

Concepts you’ll meet along the way

A few terms recur across the series. Here’s what they mean upfront:

Project. A workspace that scopes a set of related experiments. Carries shared context (history, eval template, decision policy, integrations) so individual experiments don’t have to re-establish it.
Experiment. A single tracked change with a hypothesis, a target metric, and an outcome. Can be backfilled (auto-extracted from a merged PR) or created manually.
Eval template. A saved configuration that says how to score a variant against a baseline.
Decision policy. A saved set of rules that says when a variant counts as shipping, when it counts as rejecting, and when the team should iterate.
Variant. A candidate alternative to the baseline that gets evaluated.
Baseline. Whatever the variant gets compared against. The form depends on what your project produces (a model checkpoint, a dataset, a system version).

Each tutorial defines these inline as they come up. This list is for orientation, not memorization.

Prerequisites

Required for the full series.

A Remyx account
A connected GitHub integration (see Connectors)
Claude Code installed (for the implementation tutorial)

Required for the evaluation tutorial.

A Modal account, with billing enabled
A Hugging Face token, set as a project secret

Start the series

Create your project

Connect a repo and watch Remyx extract structured experiment history from your merge log. About three minutes.

Or skip ahead

Set up your discovery feed

Daily discovery digests ranked against your team’s work

Define how progress gets measured

Lock in your eval template and decision policy

Scope an experiment from a recommendation

Turn a paper into a structured experiment

Implement an Experiment

Generate a PR via the MCP integration

Run an evaluation

Score the variant on Modal, log the decision

Stay in the loop

Velocity, signal patterns, decision rationale, and lineage views

Documentation Index

​What you’ll learn

​The example used throughout

​Concepts you’ll meet along the way

​Prerequisites

​Start the series