Documentation Index
Fetch the complete documentation index at: https://docs.remyx.ai/llms.txt
Use this file to discover all available pages before exploring further.
Causal Intelligence
ExperimentOps captures decisions and surfaces patterns across the experiments your team has already run. Causal intelligence is what those captured experiments build toward: a maintained model of how your system actually works, updated as new evidence arrives from production logs, commit history, A/B tests, and instrumented decision points. The capabilities described here ship in stages. See Maturity Progression for how they map to the customer journey, and the Roadmap for current shipping status.Most of what follows is forward-looking. We publish the architecture in full so customers, advisors, and researchers can engage with the direction, push back where they disagree, and see the shape of the system well before each piece ships.
The case for broader evidence
When an AI team ships a change and resolution rate goes up, the question that matters is what caused the lift. Was it the prompt change, the retrieval upgrade, both, neither? Without a causal answer, the team’s next decision is guesswork. A/B testing answers this question well for the changes that warrant it. The limit is throughput. Each test carries fixed operational cost (instrumentation, traffic allocation, time to significance) that doesn’t scale with the rate at which a modern AI team ships candidate changes. Remyx’s approach is to broaden the evidence base. Use A/B testing where it earns its operational cost. Get causal estimates from cheaper evidence everywhere else, and use perturbation-based evidence to answer the questions A/B testing structurally cannot.Three evidence sources
Remyx integrates three evidence sources into one causal model. Each has different strengths and different operational costs, and the system uses all three together.1. Commit-correlated logs (quasi-experiments)
1. Commit-correlated logs (quasi-experiments)
2. A/B tests
2. A/B tests
When your team runs randomized controlled trials through a platform like Statsig, Eppo, LaunchDarkly, or in-house, Remyx integrates with the platform and incorporates this evidence into the model. Randomization is the gold standard for causal identification, so these results carry the most weight.
3. Counterfactual randomization (CTF-RAND)
3. Counterfactual randomization (CTF-RAND)
Remyx’s lightweight client SDK helps instrument your system’s decision points, applying counterfactual perturbations that generate the evidence the causal inference engine needs to identify effects it otherwise cannot — which part of the pipeline is doing the work, the effect of a treatment specifically on the population that received it, effect estimates at low-traffic decision points. The SDK runs in shadow mode first to audit safety before any perturbation is applied to live traffic.
How teams use the model
The causal model sits underneath the product. Your team interacts with three workflows on top of it.Answer questions in natural language
“Did the prompt change last Tuesday cause the latency regression?”The system parses the question, identifies what kind of evidence could answer it, routes to the relevant sources, and returns an estimate with a confidence interval and a clear note on whether the available data is strong enough to support a causal conclusion or only a correlational one.
Triage hypotheses prospectively
When your team proposes ten things to try, the system ranks them by how much each would actually teach you given what you already know, and identifies the cheapest evidence path for each. Each hypothesis gets one of these paths.- Already answerable from existing data.
- Answerable with quasi-experimental analysis.
- Requires an A/B test.
- Requires CTF-RAND at decision point X for two weeks.
- Unidentifiable even with full instrumentation.
Surface what’s not yet known
When you ask a question the current evidence cannot answer, the system tells you what data would answer it. That turns “we don’t know” from a dead end into a planning surface.How this fits with A/B testing
Causal intelligence works alongside A/B testing. The three evidence layers each handle a different operational price point and answer a different class of question.| A/B testing | Causal data fusion (quasi-experimental) | Counterfactual randomization | |
|---|---|---|---|
| Evidence quality | Randomized | Quasi-experimental, with RCT evidence fused in when available | Counterfactual, from instrumented perturbations |
| Operational cost | High per change | Low marginal cost (reuses existing data) | Moderate (SDK instrumentation, shadow-mode audit before live) |
| Coverage | Changes worth gating with a traffic split | Every commit boundary in the log history | Decision points where the SDK is instrumented |
| Question types | Average effect of a randomized treatment | Average effects across changes the team already shipped | Attribution, mediation, effect on the treated, low-traffic effects |
| When to use | Decisions worth the cost of randomization | Continuous learning from the work the team is already doing | Questions A/B testing structurally cannot answer |
Theoretical foundation
The system makes only those causal claims that can be backed by evidence you could in principle collect. It avoids the layers of counterfactual reasoning that depend on assumptions which cannot be checked against data. For readers familiar with Pearl’s Causal Hierarchy (PCH), the evidence sources map to layers as follows. PCH organizes causal questions into three rungs of increasing strength: L1 (associational, “what is”), L2 (interventional, “what happens if I do”), and L3 (counterfactual, “what would have happened if I had done”). Yang & Bareinboim (2025) introduce intermediate rungs L2.25 and L2.5 for the counterfactual evidence that CTF-RAND collects.| Source | Layer | Notes |
|---|---|---|
| Observational logs | L1 | Supports correlational queries and structural discovery |
| Quasi-experiments (logs + commits) | L2 | Local identification under “nothing else changed in the window” |
| A/B tests | L2 | Strong identification via randomization |
| CTF-RAND, default | L2.25 | Same overridden value applies across the trajectory |
| CTF-RAND, mediation opt-in | L2.5 | Per-child value assignment for mediation analysis |