Skip to content

Usage Monitoring

Cluster admins can use Usage Monitoring to oversee usage of GPU infrastructure dedicated to their tenants. By reviewing utilization metrics, admins can efficiently manage and troubleshoot resource usage.


1. Overview

The Usage Monitoring Dashboard displays key GPU node stats and real-time utilization data. These insights help you balance workloads, prevent bottlenecks, and ensure effective GPU usage.


2. Utilization Metrics

Track real-time GPU resource usage:

  • Tenant GPU Usage Distribution – GPU usage distrubution accross all seleceted tenants.
  • Tenant GPU hours Interval – Accumulated GPU hours accross selected tenants.
  • Tenants – List and usage details of all tenants selected. Clicking on a tenant open more usage details on selected tenant.

3. Best Practices

  • Routine Checks
    Regularly monitor your GPU infrastructure to catch potential performance issues early.
  • Optimal Allocation
    Use metrics to determine if you need to scale resources or rebalance workloads for maximum efficiency.

Next Steps