Benchmark Examples
This page showcases examples of knowledge/ASR benchmarks and custom tasks, illustrating how to set up various configurations.
1. Simple Knowledge Benchmark
1.1 Setup
- Name:
Benchmark Math - Tasks:
ammlu_high_school_mathematicsammlu_college_mathematics
1.2 Model
- Source:
HF - Model:
tiiuae/falcon-7b - Accelerator:
L4 - Storage:
64 - Memory:
32
2. Hugging Face Leaderboard Benchmark
2.1 Setup
- Name:
HF Leaderboard - Tasks:
truthfulqahellaswag, 10-shotarc_challenge, 25-shotwinogrande, 5-shotgsm8k, 5-shotmmlu, 5-shot
2.2 Model
- Source:
HF - Model:
mistralai/Mistral-7B-Instruct-v0.2 - Secrets Blueprint:
HF Model Read(access token required) - Accelerator:
A10G - Storage:
120 - Memory:
64
3. Custom Task: BoolQ
3.1 Setup
- Name:
BoolQ - Task Output Type:
multiple_choice - Evaluation Dataframe: BoolQ
- Prompt Column:
{passage}\n{question} - Answer Column:
{answer} - Fixed Choices:
true - Possible Choices:
truefalse
- Metrics:
exact_match
4. AR/EN ASR Benchmark
4.1 Setup
- Name:
AR/EN benchmark
4.1.1 Dataset 1: English (Fleurs)
- Source:
HF - Model:
google/fleurs - Subset:
en_us - Split:
test - Audio Column:
audio - Transcription Column:
transcription - Normalizer:
true - Language:
en
4.1.2 Dataset 2: Arabic (Fleurs)
- Source:
HF - Model:
google/fleurs - Subset:
ar_eg - Split:
test - Audio Column:
audio - Transcription Column:
transcription - Normalizer:
true - Language:
ar
4.2 Model
- Source:
HF - Model:
openai/whisper-small - Accelerator:
L4 - Storage:
32 - Memory:
16