Skip to content

Usage Monitoring

Cluster admins can use Usage Monitoring to oversee usage of GPU infrastructure dedicated to their tenants. By reviewing utilization metrics, admins can efficiently manage and troubleshoot resource usage.


1. Overview

The Usage Monitoring Dashboard displays key GPU node stats and real-time utilization data. These insights help you balance workloads, prevent bottlenecks, and ensure effective GPU usage.


2. Utilization Metrics

Track real-time GPU resource usage:

  • Tenant GPU Usage Distribution – GPU usage distrubution across all selected tenants.
  • Tenant GPU Hours Interval – Accumulated GPU hours across selected tenants.
  • Tenants – List and usage details of all tenants selected. Clicking on a tenant open more usage details on selected tenant.

3. Best Practices

  • Routine Checks
    Regularly monitor your GPU infrastructure to catch potential performance issues early.
  • Optimal Allocation
    Use metrics to determine if you need to scale resources or rebalance workloads for maximum efficiency.

Next Steps