Discovery phase · ~5 minutes A research paper isn’t an experiment. To act on a recommendation, you need a hypothesis, a target metric, a target repo, and enough context that whoever implements it doesn’t have to re-read the paper to remember why. This tutorial walks through turning a recommendation into a scoped experiment ready to implement.Documentation Index
Fetch the complete documentation index at: https://docs.remyx.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites. You’ve completed the previous tutorials in the series, so the project, discovery feed, eval template, and decision policy are all in place. Recommendations have populated, or you can use Search instead.
Where recommendations come from
Three surfaces produce candidate experiments. Pick whichever fits your moment.From the daily digest (most common)
From the daily digest (most common)
Your project’s research interest delivers a daily digest of papers, repos, models, and datasets ranked against the experiments your team has shipped.
- Open the Feed.
- Browse the recommendations. Each card shows why it was surfaced (which experiments it’s adjacent to in your project’s history).
- Click View on the one you want to act on.
From Search (when you have a specific direction in mind)
From Search (when you have a specific direction in mind)
- Go to Search and enter a query (e.g., “multi-hop retrieval for complex queries”, “depth estimation for indoor scenes”).
- Browse results. Resources tagged Runnable environment ship with a pre-built Docker image.
- Click View on a resource.
From Insights (when you want to double down on a working direction)
From Insights (when you want to double down on a working direction)
- Go to Insights to see which experiment directions in your project are producing results.
- Expand a high-signal cluster. The Recommended resources column shows what to try next on that direction.
- Click Start Experiment on a recommendation. The cluster’s tag and the active project are pre-filled.
Annotate before you scope
On the resource detail page, two tabs help you build the context you’ll need before the experiment exists:- Chat. Ask questions about the method. Useful for clarifying details the abstract glossed over.
- Annotations. Highlight key passages and add notes. Annotations carry into the experiment context, so you can come back to “the part that mattered” without re-reading.
Click Create Experiment
From the resource detail page, click Create Experiment. A scoping form opens with the resource pre-linked. Fill in:- Name. Short, descriptive. “Swap DepthPro for VGGT in the depth-estimation stage” is better than “Try VGGT.”
- Hypothesis. One sentence. “VGGT will improve spatial accuracy on small objects without requiring retraining.” The hypothesis is what the eval and decision policy from the previous setup tutorial will test against.
- Target metric. Pick from the project’s allowed-metrics list. The metric should be one of the metrics in your locked eval template, so the decision policy can score against it.
- Tags. Use existing project tags where possible. Tags drive cluster patterns in Insights.
- Target repository.
owner/repoformat. Defaults to the project’s primary linked repo.
- Resource metadata (title, abstract excerpt, key methods)
- Reference to a pre-built Docker environment (if available)
- Target repo file tree
- An AI-generated implementation plan grounded in actual file paths from your repo
Recap
You now have:- A scoped experiment with a hypothesis, target metric, tags, and target repo
- A launch context with the implementation plan ready to feed into implementation
- An annotated resource that captures why this is worth trying
draft. No code has been written, no eval has run.
Tips
Pick a target metric your eval template covers
Pick a target metric your eval template covers
The decision policy from the previous tutorial only evaluates rules against metrics in the locked eval template. If your experiment targets a metric that isn’t in the template, the policy can’t score the variant. Match the names.
Write the hypothesis like a wager
Write the hypothesis like a wager
A good hypothesis says what you expect to change and by roughly how much. “VGGT will improve spatial accuracy on small objects” is fine; “VGGT will lift spatialscore by ≥2% on the suite without regressing latency” is better. The decision policy already encodes the threshold, so the hypothesis can match it.
Edit the implementation plan before kicking off the next tutorial
Edit the implementation plan before kicking off the next tutorial
The launch context’s implementation plan is AI-generated. Skim it and edit it before you hand it off. Five minutes of editing here saves an hour of debugging later.
Next
Implement an Experiment
Connect Claude Code via MCP and generate the PR that implements the experiment.
Series overview
Full series arc
MCP Server
All Remyx tools available to agents