Benchmark Examples
This page showcases examples of knowledge/ASR benchmarks and custom tasks, illustrating how to set up various configurations.
1. Simple Knowledge Benchmark
1.1 Setup
- Name:
Benchmark Math
- Tasks:
ammlu_high_school_mathematics
ammlu_college_mathematics
1.2 Model
- Source:
HF
- Model:
tiiuae/falcon-7b
- Accelerator:
L4
- Storage:
64
- Memory:
32
2. Hugging Face Leaderboard Benchmark
2.1 Setup
- Name:
HF Leaderboard
- Tasks:
truthfulqa
hellaswag
, 10-shotarc_challenge
, 25-shotwinogrande
, 5-shotgsm8k
, 5-shotmmlu
, 5-shot
2.2 Model
- Source:
HF
- Model:
mistralai/Mistral-7B-Instruct-v0.2
- Secrets Blueprint:
HF Model Read
(access token required) - Accelerator:
A10G
- Storage:
120
- Memory:
64
3. Custom Task: BoolQ
3.1 Setup
- Name:
BoolQ
- Task Output Type:
multiple_choice
- Evaluation Dataframe: BoolQ
- Prompt Column:
{passage}\n{question}
- Answer Column:
{answer}
- Fixed Choices:
true
- Possible Choices:
true
false
- Metrics:
exact_match
4. AR/EN ASR Benchmark
4.1 Setup
- Name:
AR/EN benchmark
4.1.1 Dataset 1: English (Fleurs)
- Source:
HF
- Model:
google/fleurs
- Subset:
en_us
- Split:
test
- Audio Column:
audio
- Transcription Column:
transcription
- Normalizer:
true
- Language:
en
4.1.2 Dataset 2: Arabic (Fleurs)
- Source:
HF
- Model:
google/fleurs
- Subset:
ar_eg
- Split:
test
- Audio Column:
audio
- Transcription Column:
transcription
- Normalizer:
true
- Language:
ar
4.2 Model
- Source:
HF
- Model:
openai/whisper-small
- Accelerator:
L4
- Storage:
32
- Memory:
16