A/B Testing

Overview

Performing A/B Testing on models enables engineers to evaluate and compare the performance of different model deployments by splitting incoming traffic and analyzing performance metrics. This feature is designed to help teams makedata-driven decisions, optimize models, and align improvements with business goals, particularly for Generative AI (GenAI) use cases, including Retrieval-Augmented Generation (RAG) systems.

Models AB Testing

How does A/B Testing work ?

With the A/B Testing feature, you can:

Select a Champion Model and a Challenger Model to compare which one is the best performer.
Expose an inference endpoint that splits traffic between the two models.
Define test parameters, feedback types, and statistical metrics to evaluate model performance.
Continuously track, store, and display feedback and performance metrics during the experiment.
Experimenting with both models simultaneously in our interactive playground
Declare a winning model and redirect all traffic to it, if desired.

Terminology

Champion Model: The control model, typically the currently deployed or baseline model.
Challenger Model: The test model, typically a new version or a different model family being evaluated.
A/B Test Experiment: The process of running both models concurrently, splitting traffic, and analyzing performance metrics to determine the superior model.

Next Steps

Experiments – Design, launch, and monitor structured experiments to analyse run‑level metrics before or during an A/B test.
Registered Models & Versions – Keep track of model lineage and promote the winning challenger to production once the test concludes.
Inference REST API – Call the A/B‑test endpoint programmatically or integrate it in external applications.
Human Feedback REST API – Collect real‑time user feedback from chatbots or annotation tools to strengthen statistical confidence.