Follow the instructions on how to install and authenticate using the Remyx CLI before you begin.
Overview
Customer service chat applications are on the rise, but with new LLMs constantly released, which makes the best base model for your application? In this tutorial, we’ll show how easy it is to evaluate candidate models for your use-case, based on some relevant context.Comparing Candidate Models
Each LLM’s baseline capabilties are influenced by the training methods and datasets. In this example, we want the model with the best priors to handle customer queries.Making Your MyxMatch
Myxmatch is a service to simplify custom model evaluation using LLM-as-a-Judge with synthetic data. All you need is a bit of context about your use-case or representative data samples.Conclusion
After scoring and ranking each model’s response,Qwen2-1.5B
model stands out from the remaining candidates with strong baseline capabilities for customer service use-cases.