Curate
Data Composer
With as little context as a seed phrase, you can design datasets for fine-tuning to a use-case. This tool will also let you augment an existing dataset if provided a file or a Hugging Face dataset. Click on [Data Composer] tool and give your data generation job a name. Provide an example prompt or a dataset to generate samples.
In a few minutes, you’ll see your data processing job complete under the Datasets tab. From there you’ll see a preview of your dataset and the option to download.
Score
You may also want to score the quality of a dataset. The Remyx Score tool, powered by Prometheus 2, helps you judge data on a rubric emphasizing the qualities and tasks you want to maximize. We’ll show you how to format your data and design a rubric for scoring.
The Score tool is currently available for text data - more data modalities coming soon!
Navigate to the Score tool in the home screen to get started.
Design A Rubric
To design a rubric, first describe the criteria that grounds the scoring. For example, if you want to judge responses on how respectful they are, you can describe your criteria like:
You will also need to provide descriptions for what makes good and bad responses. Here are some examples continuing on the example scenario above:
Format Your Data
Your dataset must include three string columns named prompt
, response
, and reference_answer
. These columns represent input values, output values, and ideal output values, respectively. If your column names differ, this order will be assumed. You can point to a publically available Hugging Face dataset or upload a CSV file.
Once you’ve provided all the inputs and submitted your scoring job, you’ll be redirected to the Scores tab view where you can see your scoring job progress. Once it is complete, click on your score job name and you’ll see the results. On the right hand side, you’ll see the full rubric created from your criteria and positive/negative descriptions. In the center, you’ll see a preview of the scores generated, including two new columns to your dataset: feedback
and score
.
What’s next?
Great, you’ve created a dataset for your use case. You can explore more tools including: