Skip to content

Documentation

Benchmark Examples

Benchmark Examples

This page showcases examples of knowledge/ASR benchmarks and custom tasks, illustrating how to set up various configurations.

1. Simple Knowledge Benchmark

1.1 Setup

Name: Benchmark Math
Tasks:
- ammlu_high_school_mathematics
- ammlu_college_mathematics

1.2 Model

Source: HF
Model: tiiuae/falcon-7b
Accelerator: L4
Storage: 64
Memory: 32

2. Hugging Face Leaderboard Benchmark

2.1 Setup

Name: HF Leaderboard
Tasks:
- truthfulqa
- hellaswag, 10-shot
- arc_challenge, 25-shot
- winogrande, 5-shot
- gsm8k, 5-shot
- mmlu, 5-shot

2.2 Model

Source: HF
Model: mistralai/Mistral-7B-Instruct-v0.2
Secrets Blueprint: HF Model Read (access token required)
Accelerator: A10G
Storage: 120
Memory: 64

3. Custom Task: BoolQ

3.1 Setup

Name: BoolQ
Task Output Type: multiple_choice
Evaluation Dataframe: BoolQ
Prompt Column: {passage}\n{question}
Answer Column: {answer}
Fixed Choices: true
Possible Choices:
- true
- false
Metrics:
- exact_match

4. AR/EN ASR Benchmark

4.1 Setup

Name: AR/EN benchmark

4.1.1 Dataset 1: English (Fleurs)

Source: HF
Model: google/fleurs
Subset: en_us
Split: test
Audio Column: audio
Transcription Column: transcription
Normalizer: true
Language: en

4.1.2 Dataset 2: Arabic (Fleurs)

Source: HF
Model: google/fleurs
Subset: ar_eg
Split: test
Audio Column: audio
Transcription Column: transcription
Normalizer: true
Language: ar

4.2 Model

Source: HF
Model: openai/whisper-small
Accelerator: L4
Storage: 32
Memory: 16