Skip to content

Documentation

Benchmark Examples

Documentation

Home
Home
Platform Admin
Platform Admin
Tenant Admin
Tenant Admin
Developer
Developer
- Workspace
  Workspace
  - Fundamentals
  - User Interface
- Tracking
  Tracking
  - Fundamentals
  - User Interface
    User Interface
    
    Experiments
    
    Runs
  - API Client
  - Examples
- Models
  Models
  - Fundamentals
  - Registered Models & Versions
- Deployment
  Deployment
  - Fundamentals
  - User Interface
- A/B Testing
  A/B Testing
- Inference
  Inference
- Jobs
  Jobs
- LLM
  LLM
  - Fine Tuning
    Fine Tuning
    
    Fundamentals
    
    User Interface
    
    LoRa Adapters
  - Knowledge Benchmark
    Knowledge Benchmark
    
    Fundamentals
    
    User Interface
  - ASR Benchmark
    ASR Benchmark
    
    Fundamentals
    
    User Interface
  - Examples
    Examples
    
    Fine Tuning
    
    Benchmarks Benchmarks
    Table of contents
    
    1. Simple Knowledge Benchmark
    
    1.1 Setup
    
    1.2 Model
    
    2. Hugging Face Leaderboard Benchmark
    
    2.1 Setup
    
    2.2 Model
    
    3. Custom Task: BoolQ
    
    3.1 Setup
    
    4. AR/EN ASR Benchmark
    
    4.1 Setup
    
    4.1.1 Dataset 1: English (Fleurs)
    
    4.1.2 Dataset 2: Arabic (Fleurs)
    
    4.2 Model
- Annotations
  Annotations
  - Human Feedback
    Human Feedback
    
    Fundamentals
    
    User Interface
    
    REST API
    
    Formatting Function Examples
- Blueprints
  Blueprints
  - Fundamentals
  - User Interface
- Datasets & Dataframes
  Datasets & Dataframes
- Notebooks
  Notebooks
  - Fundamentals
  - User Interface
- Identity And Access Management
  Identity And Access Management

Benchmark Examples

This page showcases examples of knowledge/ASR benchmarks and custom tasks, illustrating how to set up various configurations.

1. Simple Knowledge Benchmark

1.1 Setup

Name: Benchmark Math
Tasks:
- ammlu_high_school_mathematics
- ammlu_college_mathematics

1.2 Model

Source: HF
Model: tiiuae/falcon-7b
Accelerator: L4
Storage: 64
Memory: 32

2. Hugging Face Leaderboard Benchmark

2.1 Setup

Name: HF Leaderboard
Tasks:
- truthfulqa
- hellaswag, 10-shot
- arc_challenge, 25-shot
- winogrande, 5-shot
- gsm8k, 5-shot
- mmlu, 5-shot

2.2 Model

Source: HF
Model: mistralai/Mistral-7B-Instruct-v0.2
Secrets Blueprint: HF Model Read (access token required)
Accelerator: A10G
Storage: 120
Memory: 64

3. Custom Task: BoolQ

3.1 Setup

Name: BoolQ
Task Output Type: multiple_choice
Evaluation Dataframe: BoolQ
Prompt Column: {passage}\n{question}
Answer Column: {answer}
Fixed Choices: true
Possible Choices:
- true
- false
Metrics:
- exact_match

4. AR/EN ASR Benchmark

4.1 Setup

Name: AR/EN benchmark

4.1.1 Dataset 1: English (Fleurs)

Source: HF
Model: google/fleurs
Subset: en_us
Split: test
Audio Column: audio
Transcription Column: transcription
Normalizer: true
Language: en

4.1.2 Dataset 2: Arabic (Fleurs)

Source: HF
Model: google/fleurs
Subset: ar_eg
Split: test
Audio Column: audio
Transcription Column: transcription
Normalizer: true
Language: ar

4.2 Model

Source: HF
Model: openai/whisper-small
Accelerator: L4
Storage: 32
Memory: 16