Scope an experiment from a recommendation

Discovery phase · ~5 minutes A research paper isn’t an experiment. To act on a recommendation, you need a hypothesis, a target metric, a target repo, and enough context that whoever implements it doesn’t have to re-read the paper to remember why. This tutorial walks through turning a recommendation into a scoped experiment ready to implement.

Prerequisites. You’ve completed the previous tutorials in the series, so the project, discovery feed, eval template, and decision policy are all in place. Recommendations have populated, or you can use Search instead.

Where recommendations come from

Three surfaces produce candidate experiments. Pick whichever fits your moment.

From the daily digest (most common)

Your project’s research interest delivers a daily digest of papers, repos, models, and datasets ranked against the experiments your team has shipped.

Open the Feed.
Browse the recommendations. Each card shows why it was surfaced (which experiments it’s adjacent to in your project’s history).
Click View on the one you want to act on.

From Search (when you have a specific direction in mind)

Go to Search and enter a query (e.g., “multi-hop retrieval for complex queries”, “depth estimation for indoor scenes”).
Browse results. Resources tagged Runnable environment ship with a pre-built Docker image.
Click View on a resource.

From Insights (when you want to double down on a working direction)

Go to Insights to see which experiment directions in your project are producing results.
Expand a high-signal cluster. The Recommended resources column shows what to try next on that direction.
Click Start Experiment on a recommendation. The cluster’s tag and the active project are pre-filled.

The flow that follows is the same regardless of which surface you came from.

Annotate before you scope

On the resource detail page, two tabs help you build the context you’ll need before the experiment exists:

Chat. Ask questions about the method. Useful for clarifying details the abstract glossed over.
Annotations. Highlight key passages and add notes. Annotations carry into the experiment context, so you can come back to “the part that mattered” without re-reading.

Spend two minutes here. The annotations you write are what makes the experiment scoped to your work rather than a generic implementation of the paper.

Click Create Experiment

From the resource detail page, click Create Experiment. A scoping form opens with the resource pre-linked. Fill in:

Name. Short, descriptive. “Swap DepthPro for VGGT in the depth-estimation stage” is better than “Try VGGT.”
Hypothesis. One sentence. “VGGT will improve spatial accuracy on small objects without requiring retraining.” The hypothesis is what the eval and decision policy from the previous setup tutorial will test against.
Target metric. Pick from the project’s allowed-metrics list. The metric should be one of the metrics in your locked eval template, so the decision policy can score against it.
Tags. Use existing project tags where possible. Tags drive cluster patterns in Insights.
Target repository. owner/repo format. Defaults to the project’s primary linked repo.

Click Create. After creation, the experiment detail page loads with an Origin section that generates a launch context within a few seconds:

Resource metadata (title, abstract excerpt, key methods)
Reference to a pre-built Docker environment (if available)
Target repo file tree
An AI-generated implementation plan grounded in actual file paths from your repo

The launch context is what the implementation tutorial uses next.

Recap

You now have:

A scoped experiment with a hypothesis, target metric, tags, and target repo
A launch context with the implementation plan ready to feed into implementation
An annotated resource that captures why this is worth trying

The experiment is at status draft. No code has been written, no eval has run.

Tips

Pick a target metric your eval template covers

The decision policy from the previous tutorial only evaluates rules against metrics in the locked eval template. If your experiment targets a metric that isn’t in the template, the policy can’t score the variant. Match the names.

Write the hypothesis like a wager

A good hypothesis says what you expect to change and by roughly how much. “VGGT will improve spatial accuracy on small objects” is fine; “VGGT will lift spatialscore by ≥2% on the suite without regressing latency” is better. The decision policy already encodes the threshold, so the hypothesis can match it.

Edit the implementation plan before kicking off the next tutorial

The launch context’s implementation plan is AI-generated. Skim it and edit it before you hand it off. Five minutes of editing here saves an hour of debugging later.

Implement an Experiment

Connect Claude Code via MCP and generate the PR that implements the experiment.

Series overview

Full series arc

MCP Server

All Remyx tools available to agents

Documentation Index

​Where recommendations come from

​Annotate before you scope

​Click Create Experiment

​Recap

​Tips

​Next

Implement an Experiment

Series overview

MCP Server

Where recommendations come from

Annotate before you scope

Click Create Experiment

Recap

Tips

Next