Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.remyx.ai/llms.txt

Use this file to discover all available pages before exploring further.

Discovery phase · ~5 minutes A research paper isn’t an experiment. To act on a recommendation, you need a hypothesis, a target metric, a target repo, and enough context that whoever implements it doesn’t have to re-read the paper to remember why. This tutorial walks through turning a recommendation into a scoped experiment ready to implement.
Prerequisites. You’ve completed the previous tutorials in the series, so the project, discovery feed, eval template, and decision policy are all in place. Recommendations have populated, or you can use Search instead.

Where recommendations come from

Three surfaces produce candidate experiments. Pick whichever fits your moment.
Your project’s research interest delivers a daily digest of papers, repos, models, and datasets ranked against the experiments your team has shipped.
  1. Open the Feed.
  2. Browse the recommendations. Each card shows why it was surfaced (which experiments it’s adjacent to in your project’s history).
  3. Click View on the one you want to act on.
  1. Go to Search and enter a query (e.g., “multi-hop retrieval for complex queries”, “depth estimation for indoor scenes”).
  2. Browse results. Resources tagged Runnable environment ship with a pre-built Docker image.
  3. Click View on a resource.
  1. Go to Insights to see which experiment directions in your project are producing results.
  2. Expand a high-signal cluster. The Recommended resources column shows what to try next on that direction.
  3. Click Start Experiment on a recommendation. The cluster’s tag and the active project are pre-filled.
The flow that follows is the same regardless of which surface you came from.

Annotate before you scope

On the resource detail page, two tabs help you build the context you’ll need before the experiment exists:
  • Chat. Ask questions about the method. Useful for clarifying details the abstract glossed over.
  • Annotations. Highlight key passages and add notes. Annotations carry into the experiment context, so you can come back to “the part that mattered” without re-reading.
Spend two minutes here. The annotations you write are what makes the experiment scoped to your work rather than a generic implementation of the paper.

Click Create Experiment

From the resource detail page, click Create Experiment. A scoping form opens with the resource pre-linked. Fill in:
  • Name. Short, descriptive. “Swap DepthPro for VGGT in the depth-estimation stage” is better than “Try VGGT.”
  • Hypothesis. One sentence. “VGGT will improve spatial accuracy on small objects without requiring retraining.” The hypothesis is what the eval and decision policy from the previous setup tutorial will test against.
  • Target metric. Pick from the project’s allowed-metrics list. The metric should be one of the metrics in your locked eval template, so the decision policy can score against it.
  • Tags. Use existing project tags where possible. Tags drive cluster patterns in Insights.
  • Target repository. owner/repo format. Defaults to the project’s primary linked repo.
Click Create. After creation, the experiment detail page loads with an Origin section that generates a launch context within a few seconds:
  • Resource metadata (title, abstract excerpt, key methods)
  • Reference to a pre-built Docker environment (if available)
  • Target repo file tree
  • An AI-generated implementation plan grounded in actual file paths from your repo
The launch context is what the implementation tutorial uses next.

Recap

You now have:
  • A scoped experiment with a hypothesis, target metric, tags, and target repo
  • A launch context with the implementation plan ready to feed into implementation
  • An annotated resource that captures why this is worth trying
The experiment is at status draft. No code has been written, no eval has run.

Tips

The decision policy from the previous tutorial only evaluates rules against metrics in the locked eval template. If your experiment targets a metric that isn’t in the template, the policy can’t score the variant. Match the names.
A good hypothesis says what you expect to change and by roughly how much. “VGGT will improve spatial accuracy on small objects” is fine; “VGGT will lift spatialscore by ≥2% on the suite without regressing latency” is better. The decision policy already encodes the threshold, so the hypothesis can match it.
The launch context’s implementation plan is AI-generated. Skim it and edit it before you hand it off. Five minutes of editing here saves an hour of debugging later.

Next

Implement an Experiment

Connect Claude Code via MCP and generate the PR that implements the experiment.

Series overview

Full series arc

MCP Server

All Remyx tools available to agents