Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.remyx.ai/llms.txt

Use this file to discover all available pages before exploring further.

Maturity Progression

AI teams progress through stages of experimentation maturity as their systems grow. Remyx is designed so each stage adds an evidence source and the queries it makes possible, without retiring what came before. Different teams enter at different stages. Mature teams progress through subsequent stages as their needs evolve. The trust requirement increases gradually. Stages 1 through 3 are passive. Remyx reads commits, reads logs, reads A/B test results. Stage 4 is the first that touches production behavior, and only after shadow-mode audits.
For current shipping status of the capabilities at each stage, see the Roadmap. For the architectural through-line, see Causal Intelligence.

At a glance

StageEvidence sourceTrust requirementBest fit
1. Early devCommit semanticsRead-only repoGreenfield AI systems
2. Production trafficLogs and commit-correlated quasi-experimentsRead-only repo and logsDeployed AI systems at any scale
3. A/B testingRCT results from Statsig, Eppo, LaunchDarklyRead-only A/B test resultsMature experimentation discipline
4. Counterfactual PerturbationsCounterfactual randomization at decision pointsSDK in production code pathA/B testing has become the bottleneck

Stage 1. Early dev

No production traffic. Where you are. Building something new. The repo has commits but the deployed system has no traffic, or traffic too low to support quasi-experimental analysis. Remyx’s role. Parse the commit history to extract milestones, the meaningful structural changes that mark your team’s evolving understanding. Use the semantic content of those milestones to recommend relevant new methods, models, evaluation strategies, and architectural patterns. Reusable scaffolding speeds up implementation. There is no causal model yet because the system isn’t gathering outcomes. The value comes from a meta-layer of recommendations grounded in commit semantics and matched against Remyx’s resource index.
Trust requirement. Read-only access to your repo. Fully passive.Status. Foundational capabilities live (commit ingestion, milestone extraction, recommendation engine). The standalone Stage 1 product surface is in active development.

Stage 2. Production traffic and commits

Quasi-experiments from natural before-after windows. Where you are. You have deployed something and traffic is flowing. You may already log production telemetry to Datadog, Honeycomb, structured JSON, or similar. Remyx’s role. Correlate observational data with commit boundaries to produce quasi-experimental causal evidence. Each meaningful commit becomes a regime boundary. The windows before and after are a natural before-after comparison. Accumulating these estimates across many commits builds a causal model of your system. Recommendations from Stage 1 become evidence-grounded. Instead of “teams using similar architectures have tried Y,” you get “your last three retrieval-related commits collectively shifted resolution rate by Z; the next intervention worth considering targets the prompt selection decision point.” You can now ask questions in natural language. “Did the model swap last Tuesday cause the latency regression?” returns a causal answer with confidence intervals and identification status. You connect your existing telemetry and your repo. The causal evidence comes from data you are already collecting.
Trust requirement. Read-only access to logs and repo. Still passive.Status. Roadmap. Causal discovery, quasi-experiment identification, the graph engine, data fusion, and the natural language query layer are all on the development path. Foundational schema and ingestion pipelines are the priority for the next milestone.

Stage 3. A/B testing in production

RCT evidence enters the model. Where you are. Your experimentation practice is mature enough to run explicit A/B tests for important changes. You have an experimentation platform such as Statsig, Eppo, LaunchDarkly, or one built in-house. Remyx’s role. Integrate with your experimentation platform as another evidence source. A/B test results feed the causal model with high-confidence interventional updates. Estimates that were noisy from quasi-experimental analysis tighten when the same effect is measured through randomization. The model’s quality improves as your team’s RCT discipline matures. This stage is additive. Quasi-experimental analysis continues for the long tail of changes that aren’t worth A/B testing. RCT evidence sharpens estimates for the changes that justify the cost of a traffic split. The hypothesis triage layer pays off here. The system surfaces recommendations like “this hypothesis is borderline-identifiable from quasi-experiments; an A/B test would tighten the estimate by 3x” so you can make informed decisions about which questions deserve A/B investment.
Trust requirement. Read-only access to A/B test results. Still passive. Remyx reads the randomization you are already doing and does not introduce new randomization.Status. Roadmap. Connector framework will ship after Stage 2 quasi-experiment infrastructure is mature.

Stage 4. Counterfactual Perturbations

A/B testing has become the bottleneck. Where you are. Your team is doing enough A/B testing that the operational cost is real, and there are still important questions A/B testing cannot answer.
  • Mediation. Did the prompt change cause the lift, or was it the retrieval change shipped the same week?
  • Effect of treatment on the treated. The effect specifically on the subpopulation that receives a given policy.
  • Low-traffic decision points. Effect estimates where there is not enough volume for a powered A/B test.
Remyx’s role. Counterfactual randomization at instrumented decision points via the Remyx SDK. Adoption proceeds in two distinct steps with audits in between.
1

Shadow mode

The SDK is integrated at decision points. The natural policy is captured. No overrides are applied. You see what data the system would collect under CTF-RAND and what questions become identifiable, without any risk to production behavior. You validate that the SDK is working correctly, audit the data being captured, and confirm proposed override policies.
2

Audits

Your team or compliance reviews the shadow-mode data and proposed override behavior. For some teams (regulated industries, high-stakes consumer products), this audit phase is non-negotiable and may take weeks.
3

Live overrides

After audits pass, overrides are enabled gradually, starting at 0% override rate, ramping to 1%, 5%, and so on. Each step is reversible via the kill switch. The first decision points to get live overrides are typically low-stakes (which retry strategy, which retrieval rerank) before higher-stakes decisions.
This stage produces the counterfactual evidence that makes ETT (Effect of Treatment on the Treated) and NDE (Natural Direct Effect) estimable. The questions that have been frustrating your team through earlier stages, mediation questions and policy-improvement questions, finally become answerable.
Trust requirement. SDK integrated into production code path. Natural-policy capture during shadow mode. Controlled override of production decisions after audit. The first stage that requires meaningful trust.Status. Roadmap. SDK in log-only (shadow) mode is the first deliverable. CTF-RAND override policies and the counterfactual estimation layer ship after the causal model is mature.

How to read this

If you are an early-dev team, Stage 1 is your entry point. The other stages describe what Remyx adds as your system grows. If you have a mature experimentation practice with an in-house platform and are looking at how to handle the volume of AI changes that A/B testing alone cannot gate efficiently, Stage 3 and Stage 4 are the parts of the roadmap most relevant to you. The architectural commitment is to make causal inference operational alongside your existing A/B testing capacity.

Causal Intelligence

The architectural through-line across stages

Roadmap

Shipped vs. in-development capabilities