In this tutorial, we’ll show you how to select a model based on context about your application using MyxMatch with the Remyx CLI.

Follow the instructions on how to install and authenticate using the Remyx CLI before you begin.

Overview

Customer service chat applications are on the rise, but with new LLMs constantly released, which makes the best base model for your application? In this tutorial, we’ll show how easy it is to evaluate candidate models for your use-case, based on some relevant context.

Comparing Candidate Models

Each LLM’s baseline capabilties are influenced by the training methods and datasets. In this example, we want the model with the best priors to handle customer queries.

By instantiating a MyxBoard, you can organize the results of grouped evaluations for your model comparison.

Making Your MyxMatch

Myxmatch is a service to simplify custom model evaluation using LLM-as-a-Judge with synthetic data. All you need is a bit of context about your use-case or representative data samples.

Now we’re ready to launch evaluation jobs with the Remyx API. In this example, the prompt is tested against the candidate models and the results are asynchronously logged to the MyxBoard.

When the MyxMatch evaluation job finishes, it will automatically print a message saying the job is complete.

To see the evaluation results, you can use:

The results will be stored in a JSON object like the following:

Conclusion

After scoring and ranking each model’s response, Qwen2-1.5B model stands out from the remaining candidates with strong baseline capabilities for customer service use-cases.