What “continuous” means, precisely. The loop runs on a schedule and does the watching, reading, and first-draft work automatically. It does not make the call — every cycle surfaces a reviewable artifact (a ranked recommendation, a draft PR, a scored variant) and a person decides. The CI/CD analogy, taken honestly: the pipeline runs on every change, but a human still approves the deploy.
The loop
A single experiment in Remyx already has a lifecycle: it comes from somewhere, gets implemented, gets evaluated, and ends in a decision. Continuous Experimentation is that lifecycle run as a standing loop, where the output of each turn feeds the input of the next.Discover
Your Feed surfaces new papers, repos, and models ranked against your team’s actual shipping history. This runs daily without you asking.
Draft
Remyx’s automated discovery-PR agent (Outrider), a scheduled GitHub Action, picks the candidate most implementable against your codebase and opens a draft PR wiring it into a real call site, or a discussion Issue when a clean integration isn’t possible. This is the step that used to require a human to sit down and start reading.
Evaluate
The variant is scored against the eval template and decision policy your team committed to ahead of time, so the bar is fixed before the result is known.
Decide
A person reviews the evidence and logs the call — ship, iterate, or abandon — and why. This is the human gate, and it stays human.
CI/CD for AI experimentation
The analogy is exact in the places that matter and worth stating plainly where it isn’t.| Software CI/CD | Remyx Continuous Experimentation |
|---|---|
| A commit triggers the pipeline | A schedule (or a context change) triggers a discovery + draft run |
| The pipeline builds and tests automatically | Outrider selects, drafts, validates, and self-reviews automatically |
| A failing build never reaches review | A recommendation that can’t be cleanly integrated becomes a discussion Issue with the attempted diff attached |
| The team sets the gates (required checks, coverage) ahead of time | The team sets the eval template and decision policy ahead of time |
| A human approves the deploy | A human reviews and merges the PR, and logs the decision |
This is also why the Maturity Progression stages stay read-only and passive through Stage 3. Continuous Experimentation reads your repo, ranks against your history, and proposes changes you approve. It does not touch production behavior. The first capability that does (Stage 4 counterfactual perturbations) ships only behind shadow-mode audits.
How recommendations get sharper
A loop is only worth running continuously if each turn is better than the last. Two mechanisms make Remyx’s recommendations improve as your history grows, rather than re-surfacing the same generic results:History-aware ranking
History-aware ranking
The structured experiments Remyx extracts from your merge log feed the ranker as context, so a candidate aligned with the direction you’ve actually shipped ranks above a merely topical one. This shifts the top results meaningfully versus ranking from your interest description alone, and the reasoning cites specific past work instead of shallow keyword overlap.
A learned preference model
A learned preference model
Remyx fits a per-team preference model over your past experiments — learning from the order and lineage in which you shipped things — and scores new candidates with it as a tiebreaker behind relevance. It populates lazily and becomes meaningful past a few dozen experiments. It sharpens ranking only; it is deliberately not wired to auto-generate or auto-select experiments, which would put it in the decision seat the human holds.
ExperimentHistory whether you reached it through a Project or a repo-driven Research Interest, so the loop sees one coherent picture of your work. Beyond history, a Deep Research brief feeds the ranker as a forward-looking axis: it captures where your team intends to go next, so candidates aligned with that direction rank up even before you’ve shipped against it. If a recommendation or draft is wrong, the cost is a PR you close — nothing reaches your default branch or your users without a person putting it there.
Get started
Automated discovery PRs
Set up the scheduled draft-PR loop on a repo
Feed
Create the Research Interest that drives recommendations
ExperimentOps
The system of record this loop runs on top of
Maturity Progression
Why the loop stays passive and read-only