Benchmarks
This page contains examples of knowledge/ASR benchmarks and custom tasks.
Example 1. Simple Knowledge benchmark
A knowledge benchmark about mathematics questions in Arabic.
- Name:
Benchmark Math - Tasks:
ammlu_high_school_mathematicsammlu_college_mathematics
Model
- Source:
HF - Model:
tiiuae/falcon-7b - Accelerator:
L4 - Storage:
64 - Memory:
32
Example 2. HF Leaderboard benchmark
The knowledge benchmark used on the Huggingface leaderboard.
- Name:
HF Leaderboard - Tasks:
truthfulqahellaswag, num fewshot:10arc_challenge, num fewshot:25winogrande, num fewshot:5gsm8k, num fewshot:5mmlu, num fewshot:5
Model
- Source:
HF - Model:
mistralai/Mistral-7B-Instruct-v0.2 - Secrets Blueprint:
HF Model Read- Token with access to mistralai/Mistral-7B-Instruct-v0.2
- Accelerator:
A10G - Storage:
120 - Memory:
64
Example 3. Custom task
A custom task using the BoolQ dataset.
- Name:
BoolQ - Task Output Type:
multiple_choice - Evaluation Dataframe: BoolQ
- Prompt column:
{passage}\n{question} - Answer column:
{answer} - Use fixed choices:
true - Possible Choices:
truefalse
- Metrics:
exact_match
Example 4. AR/EN ASR benchmark
Automatic speech recognition benchmark on Arabic and English languages.
- Name:
AR/EN benchmark - Dataset 1:
fleurs - en- Source:
HF - Model:
google/fleurs - Subset:
en_us - Split:
test - Column containing the audio:
audio - Column containing the transcription:
transcription - Use normalizer:
true - Language:
en
- Source:
- Dataset 2:
fleurs - ar- Source:
HF - Model:
google/fleurs - Subset:
ar_eg - Split:
test - Column containing the audio:
audio - Column containing the transcription:
transcription - Use normalizer:
true - Language:
ar
- Source:
Model
- Source:
HF - Model:
openai/whisper-small - Accelerator:
L4 - Storage:
32 - Memory:
16