Score

The Remyx Score tool, powered by Prometheus 2, helps you score datasets and model responses on a rubric emphasizing the qualities and tasks you want to maximize. We'll show you how to format your data and design a rubric for scoring.

Navigate to the Score tool in the home screen to get started.

Design A Rubric

To design a rubric, first describe the criteria that grounds the scoring. For example, if you want to judge responses on how respectful they are, you can describe your criteria like:

Example criteria

Is the model proficient in responding respectfully to user feedback?

You will also need to provide descriptions for what makes good and bad responses. Here are some examples continuing on the example scenario above:

# Provide an explanation to describe why a response may be judged positively.
The model is welcoming of the feedback and responds in an open-minded manner.

Format Your Data

Your dataset must include three string columns named prompt, response, and reference_answer. These columns represent input values, output values, and ideal output values, respectively. If your column names differ, this order will be assumed. You can point to a publically available Hugging Face dataset or upload a CSV file.

  • Name
    prompt
    Type
    string
    Description

    Represents a user prompt, question, command, etc.

  • Name
    response
    Type
    string
    Description

    The response from the model.

  • Name
    reference_answer
    Type
    string
    Description

    The ideal answer to the "prompt" text.

Once you've provided all the inputs and submitted your scoring job, you'll be redirected to the Scores tab view where you can see your scoring job progress. Once it is complete, click on your score job name and you'll see the results. On the right hand side, you'll see the full rubric created from your criteria and positive/negative descriptions. In the center, you'll see a preview of the scores generated, including two new columns to your dataset: feedback and score.

  • Name
    feedback
    Type
    string
    Description

    An explanation behind the numerical score assigned.

  • Name
    score
    Type
    int
    Description

    A rating on a scale from 1-5, 1 being the lowest and 5 the highest score according to your rubric.

What's next?

Fantastic, you've gained more insights into how your model performs for your desired qualities. Here are some more resources to dive into:

Was this page helpful?