Roadmap

This page tracks the current state of Remyx’s capabilities. We publish it so customers, collaborators, and researchers can see where we’re going well before each capability ships. For the architectural through-line, see Causal Intelligence. For how these capabilities map to the customer journey, see Maturity Progression.

Status legend

Indicator	Status	Meaning
	Shipped	Available in production today
	In development	Actively being built
	Planned	Committed to the roadmap, not yet started
	Research	Exploring feasibility, not yet committed

Agents & ExperimentOps

These capabilities are live in production today. They form the Stage 2 foundations: the agent fleet (Agents, Inbox, Reports) plus the ExperimentOps machinery underneath it.

Status	Capability	Description
Shipped	Experiment capture	Full lifecycle from origin through decision, with target metric, hypothesis, and decision rationale captured directly on the experiment record
Shipped	Cross-experiment patterns	Tag-based clustering identifies which directions consistently produce results
Shipped	Resource discovery	Semantic search across papers, repos, models, and datasets, matched to your team’s experiment history
Shipped	History-aware recommendation ranking	The structured shipping history extracted from your repo feeds the recommendation ranker, so candidates that align with your team’s actual trajectory rank higher
Shipped	Preference-model ranking	A per-team preference model fit over past experiments scores candidate work and breaks ties behind relevance
Shipped	Outrider agents	Outrider installed per repo: a scheduled GitHub Action that opens a reviewable draft PR — or a discussion Issue — for the next paper worth implementing; a person reviews and merges
Shipped	Multi-backend agents	Agent runs route at Anthropic (Claude), Z.ai (GLM), or Moonshot (Kimi), including the two-tier drafter/refiner fleet layout
Shipped	Inbox	A derived triage queue of Review / Decide / Fix items that auto-resolves when GitHub does
Shipped	Fleet reports	The fleet in one view: KPIs, deliverable pipeline, contribution grid, per-agent states, and cost per code artifact
Shipped	Agent direction learning	Every merge, skip, and discussion re-weights what each agent proposes next — surfaced as evidence-backed Promoted / Demoted / Held directions
Shipped	Connector framework	GitHub (GitHub App), model providers (Anthropic / Z.ai / Moonshot), Linear, Jira, Slack, W&B, Hugging Face, Modal
Shipped	MCP server	Programmatic access to the fleet and platform from Claude Code and other MCP clients, including the Outrider tool plane
In development	Standalone Stage 1 product surface	Milestone-driven recommendations for early-dev teams without production traffic

Causal intelligence, evidence layer

Stage 2. The evidence layer feeding the causal model.

Status	Capability	Description
Planned	Observational log ingestion	Pluggable adapters for Datadog, Honeycomb, structured JSON, OpenTelemetry. Customers connect existing telemetry, and Remyx populates the evidence layer without instrumentation changes
Planned	Commit-correlated regime boundary detection	Extension of the existing repo integration to identify regime changes in the data-generating process from commit history
Planned	Quasi-experiment identification	Combine observational logs with regime boundaries to produce identified causal effects via difference-in-differences, interrupted time series, or regression discontinuity
Planned	Causal discovery	Bootstrap a partial causal graph from observational data, supplemented by regime-boundary structure. Discovered structure is human-validated before being treated as the working model
Planned	Causal graph engine	Versioned causal graph as a top-level object, supporting interventional, counterfactual, and mediation-aware graphical models. Semi-Markovian formulation to handle latent confounders
Planned	Causal data fusion	Combine evidence from multiple sources into one coherent posterior. Conflict resolution, graph refinement proposals, continuous incremental updates

Causal intelligence, query and interaction

Status	Capability	Description
Planned	Identification dispatcher	Given a question, classify it by required identification layer, route to evidence sources with appropriate identification, and dispatch to estimation logic
Planned	Natural language query layer	Customer-facing interface that takes natural language questions, parses into formal estimands, and returns natural language answers with identification status and recommendations

Causal intelligence, Stage 3

Status	Capability	Description
Planned	A/B test integration framework	Connectors for Statsig, Eppo, LaunchDarkly. A/B test results become an evidence source feeding the causal model

Causal intelligence, Stage 4

Status	Capability	Description
Planned	Shadow-decision SDK (log-only mode)	Python SDK that wraps decision points in your AI system. The first version captures natural policy output without applying overrides
Planned	Shadow-mode audit infrastructure	Dedicated product surface for the shadow-mode adoption phase. Audit trail viewer, override proposal review, and compliance reporting
Planned	CTF-RAND override policies	Extension of the SDK with counterfactual randomization. Trajectory-consistent semantics by default. Per-decision-point semantics opt-in for mediation analysis
Planned	ETT and NDE estimation	Counterfactual estimation procedures for effect-of-treatment-on-the-treated and natural direct effect

Hypothesis triage

The triage layer matures with the customer — and it is what the agent fleet’s ranking grows into. Stage 2 customers see quasi-experimental recommendations. Stage 3 customers see A/B test recommendations. Stage 4 customers see CTF-RAND recommendations.

Status	Capability	Description
Planned	Hypothesis ranking and evidence path recommendation	Rank hypotheses by expected information gain. Identify the cheapest evidence path to an answer for each
Planned	Orchestration scheduler	Coordinate active interventions (CTF-RAND randomizations, A/B tests, quasi-experiment analyses) for maximum concurrency without compromising estimate validity
Planned	Identification-enabling intervention proposals	Proactively propose CTF-RAND or A/B interventions that would make currently-unidentifiable hypotheses estimable

How we sequence

Foundational evidence-layer capabilities ship first. The causal model and query layer come next. A/B integration follows. Counterfactual perturbations ship last. The dependency hot path through the architecture follows this order.

Evidence schema.
Observational log ingestion.
Quasi-experiment identification.
Causal graph engine and data fusion.
Identification dispatcher and natural language query.
A/B integration.
Shadow-decision SDK.
CTF-RAND override policies.
ETT and NDE estimation.

This roadmap is shared publicly. We update it as priorities shift.

Get started

Discover

Agents

Review & decide

Configure

Background

Roadmap

Roadmap

Status legend

Agents & ExperimentOps

Causal intelligence, evidence layer

Causal intelligence, query and interaction

Causal intelligence, Stage 3

Causal intelligence, Stage 4

Hypothesis triage

How we sequence

​Roadmap

​Status legend

​Agents & ExperimentOps

​Causal intelligence, evidence layer

​Causal intelligence, query and interaction

​Causal intelligence, Stage 3

​Causal intelligence, Stage 4

​Hypothesis triage

​How we sequence

Roadmap

Status legend

Agents & ExperimentOps

Causal intelligence, evidence layer

Causal intelligence, query and interaction

Causal intelligence, Stage 3

Causal intelligence, Stage 4

Hypothesis triage

How we sequence