Implement an Experiment

Build phase · ~10 minutes The experiment from Scope an experiment from a recommendation is scoped but unimplemented. There’s no code yet. This tutorial connects Claude Code to your Remyx project over MCP and uses the experiment’s launch context to generate a PR that puts the variant into your repo.

Prerequisites. You’ve completed Scope an experiment from a recommendation. There’s a scoped experiment with a generated launch context. You also need Claude Code installed on your machine.

Get your Remyx API key

Open engine.remyx.ai/account.
Go to API Access.
Create a new key (or copy an existing one).

Keep the key handy. You’ll paste it into Claude Code’s MCP config in the next step.

Connect Claude Code to Remyx

Add the Remyx MCP server to Claude Code’s settings. Two scopes are available. Pick whichever fits your workflow.

Global config
Project config

Edit ~/.claude/settings.json:

{
  "mcpServers": {
    "remyx": {
      "type": "http",
      "url": "https://mcp.remyx.ai/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_REMYX_API_KEY"
      }
    }
  }
}

Available across every Claude Code session.

Create or edit .claude/settings.json in your repo’s root:

{
  "mcpServers": {
    "remyx": {
      "type": "http",
      "url": "https://mcp.remyx.ai/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_REMYX_API_KEY"
      }
    }
  }
}

Scoped to this repo only.

Verify the connection

Start Claude Code in your repo, then ask:

What Remyx tools do you have available?

The response should list the available MCP tools, including get_digest, get_experiments, create_experiment, update_experiment, get_experiment_context, log_decision, and set_project_context.

Set the project context

So that subsequent tool calls operate against the right Remyx project:

> Set my project context to the VQASynth project I just created.
  [Claude calls set_project_context]

Project-scoped tool calls (get_experiments, create_experiment, etc.) use this context automatically. Override on individual calls to switch projects mid-session.

Pull the experiment context

Get the launch context for the experiment scoped in the previous tutorial:

> Show me the experiment we scoped for the depth-estimator swap,
  with its full implementation context.
  [Claude calls get_experiment_context]

The response includes:

Resource metadata. Title, abstract excerpt, key methods, your annotations.
Docker environment. Reference to a pre-built environment if the resource ships one.
Target repo structure. File tree of the repo so the agent knows where things live.
Implementation plan. AI-generated plan referencing actual file paths in your repo.

This is what the agent uses to ground the implementation. The more you edited the implementation plan during scoping, the cleaner the result.

Generate the implementation

Ask the agent to implement the technique:

Implement the technique from this experiment in my repo,
following the implementation plan. Open a PR when done.

The agent will:

Read the implementation plan.
Navigate the repo structure.
Generate the code changes.
Commit on a new branch.
Push and open a PR.

For an automated end-to-end run that bundles the steps:

Run the implementation pipeline for the depth-estimator-swap experiment.

This calls run_experiment_implementation, which triggers the full clone → implement → push → PR sequence in one shot.

Link the PR back to the experiment

Once the PR is open, link it to the experiment so status syncs and you can find it later:

Update the depth-estimator-swap experiment with the PR link
https://github.com/myorg/myrepo/pull/123.

The agent calls update_experiment to record the PR. If your repo has the GitHub integration connected (see Connectors), the PR’s status (open, merged, closed) syncs automatically via webhooks. You don’t need to update the experiment manually as the PR moves through review.

Review and iterate

The implementation produced is a starting point, not the finished thing. Treat the PR like any other: review the diff, run local tests, adjust where the agent missed nuance, push commits. The experiment’s status updates as the PR moves. It transitions from draft to running once the variant is built and the eval can pick it up in the next tutorial.

Recap

You now have:

Claude Code connected to Remyx via MCP
A PR in your target repo that implements the experiment
The PR linked back to the Remyx experiment, with status syncing automatically

The variant is real code that will be evaluated next.

Full session example

End-to-end Claude Code session:

> Set my project context to the VQASynth project.
  [Claude calls set_project_context]

> Show me today's discovery digest.
  [Claude calls get_digest, lists recommendations]

> Create an experiment from the VGGT depth-estimator paper.
  Hypothesis: VGGT will lift spatialscore on the suite by ≥2%
  without regressing latency. Target metric: spatialscore.
  Tags: depth-estimation. Project: VQASynth.
  [Claude calls create_experiment]

> Get the implementation context for that experiment.
  [Claude calls get_experiment_context]

> Implement the technique following the plan, open a PR.
  [Claude reads plan, edits files, commits, pushes, opens PR]

> Update the experiment with the PR link.
  [Claude calls update_experiment]

After this, Run an evaluation runs the actual eval against the locked template.

Tips

Reference experiments by full name or ID

The agent matches against your experiment list. Specific names avoid ambiguity when several experiments share keywords.

Edit the implementation plan before generating

Open the experiment in the Remyx UI and edit the AI-generated implementation plan before asking for the implementation. Specific guidance produces better PRs than letting the agent infer from the launch context alone.

Combine with the GitHub integration

With GitHub connected, the PR auto-links to the experiment. Status changes (merged, closed) sync via webhooks. You don’t need to manually update the experiment as the PR moves through review.

Run an evaluation

Score the variant against the locked eval template on Modal, log the decision.

MCP Server reference

Full Remyx tool list for agents

Series overview

Full series arc

Documentation Index

​Get your Remyx API key

​Connect Claude Code to Remyx

​Verify the connection

​Set the project context

​Pull the experiment context

​Generate the implementation

​Link the PR back to the experiment

​Review and iterate

​Recap

​Full session example

​Tips

​Next

Run an evaluation

MCP Server reference

Series overview

Get your Remyx API key

Connect Claude Code to Remyx

Verify the connection

Set the project context

Pull the experiment context

Generate the implementation

Link the PR back to the experiment

Review and iterate

Recap

Full session example

Tips

Next