> ## Documentation Index > Fetch the complete documentation index at: https://docs.remyx.ai/llms.txt > Use this file to discover all available pages before exploring further. # Continuous Experimentation > The loop that keeps running on top of ExperimentOps: discovery, drafting, evaluation, and decision, with each turn sharpening the next. CI/CD-style cadence for AI experimentation, with a human at the merge. [ExperimentOps](/concepts/experimentops) is your system of record — what your team tried, why, and what you decided. **Continuous Experimentation** puts that record to work: it runs the discovery-to-decision loop on a cadence, so keeping up with the field becomes a background process instead of a second job — the way CI/CD turned "we deploy when someone remembers to" into "every change runs the pipeline." The bottleneck it removes is the legwork before a decision. New techniques ship across arXiv, Hugging Face, and GitHub every week; ideas are plentiful, and the cost is finding the few that fit your codebase and turning them into something you can actually evaluate. The loop does that work — surface, validate, draft — and leaves you the judgment, so effort flows to the most fruitful directions. **What "continuous" means, precisely.** The loop runs on a schedule and does the watching, reading, and first-draft work automatically. It does **not** make the call — every cycle surfaces a reviewable artifact (a ranked recommendation, a draft PR, a scored variant) and a person decides. The CI/CD analogy, taken honestly: the pipeline runs on every change, but a human still approves the deploy. *** ## The loop A single experiment in Remyx already has a lifecycle: it comes from somewhere, gets implemented, gets evaluated, and ends in a decision. Continuous Experimentation is that lifecycle run as a *standing loop*, where the output of each turn feeds the input of the next. Your [Feed](/platform/discover/feed) surfaces new papers, repos, and models ranked against your team's actual shipping history. This runs daily without you asking. Remyx's [automated discovery-PR agent](/platform/discover/outrider) (**Outrider**), a scheduled GitHub Action, picks the candidate most *implementable* against your codebase and opens a **draft PR** wiring it into a real call site, or a **discussion Issue** when a clean integration isn't possible. This is the step that used to require a human to sit down and start reading. The variant is scored against the [eval template and decision policy](/platform/experiments/evaluation) your team [committed to ahead of time](/tutorials/get-started/define-how-progress-gets-measured), so the bar is fixed before the result is known. A person reviews the evidence and logs the call — ship, iterate, or abandon — and why. This is the human gate, and it stays human. The decision becomes part of the record. The next discovery pass is ranked against a history that now includes this experiment. The loop tightens. The first two steps are where Continuous Experimentation does new work: it closes the gap between *a relevant paper exists* and *there's a reviewable proposal in front of me*. Steps three through five are the [ExperimentOps](/concepts/experimentops) loop you already run — now fed automatically. *** ## CI/CD for AI experimentation The analogy is exact in the places that matter and worth stating plainly where it isn't. | Software CI/CD | Remyx Continuous Experimentation | | ----------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- | | A commit triggers the pipeline | A schedule (or a context change) triggers a discovery + draft run | | The pipeline builds and tests automatically | Outrider selects, drafts, validates, and self-reviews automatically | | A failing build never reaches review | A recommendation that can't be cleanly integrated becomes a discussion Issue with the attempted diff attached | | The team sets the gates (required checks, coverage) ahead of time | The team sets the eval template and decision policy ahead of time | | **A human approves the deploy** | **A human reviews and merges the PR, and logs the decision** | Where the analogy breaks — and where overclaiming would erode trust — is the last row. CI/CD ends in an automated deploy because the correctness bar is a passing test suite. AI experimentation ends in a *judgment* about whether a change actually moved a business outcome, which is not something to hand to a cron job. Remyx automates the toil up to that judgment and stops there. This is also why the [Maturity Progression](/concepts/maturity-progression) stages stay **read-only and passive** through Stage 3. Continuous Experimentation reads your repo, ranks against your history, and proposes changes you approve. It does not touch production behavior. The first capability that does (Stage 4 counterfactual perturbations) ships only behind shadow-mode audits. ***

How recommendations get sharper

A loop is only worth running continuously if each turn is better than the last. Two mechanisms make Remyx's recommendations improve as your history grows, rather than re-surfacing the same generic results: The structured experiments Remyx extracts from your merge log feed the ranker as context, so a candidate aligned with the direction you've actually shipped ranks above a merely topical one. This shifts the top results meaningfully versus ranking from your interest description alone, and the reasoning cites specific past work instead of shallow keyword overlap. Remyx fits a per-team preference model over your past experiments — learning from the order and lineage in which you shipped things — and scores new candidates with it as a tiebreaker behind relevance. It populates lazily and becomes meaningful past a few dozen experiments. It sharpens ranking only; it is deliberately *not* wired to auto-generate or auto-select experiments, which would put it in the decision seat the human holds. Both converge on the same `ExperimentHistory` whether you reached it through a [Project](/platform/manage/projects) or a repo-driven [Research Interest](/platform/discover/feed#research-interests-from-a-repo), so the loop sees one coherent picture of your work. Beyond history, a [Deep Research](/platform/discover/deep-research) brief feeds the ranker as a forward-looking axis: it captures where your team intends to go next, so candidates aligned with that direction rank up even before you've shipped against it. If a recommendation or draft is wrong, the cost is a PR you close — nothing reaches your default branch or your users without a person putting it there. *** ## Get started Set up the scheduled draft-PR loop on a repo Create the Research Interest that drives recommendations The system of record this loop runs on top of Why the loop stays passive and read-only