Skip to content

Job Tracking

oi-tracking-client allows you to track metrics and artifacts for jobs in the OICM platform. By default, the environment automatically provides key variables like api_host, api_key, and workspace_name, simplifying setup.


1. Using oi-tracking-client

1.1 Setup

The OICM platform integrates the oip-tracking-client with the Jobs module. The following environment variables are preconfigured:

  • api_host – Points to the current environment’s API host.
  • api_key – Handled internally; no need to supply credentials.
  • workspace_name – Matches the workspace running the job.

You can override these if you prefer tracking in another environment or workspace.

Example

import os
from oip_tracking_client.tracking import TrackingClient

# Connect automatically using environment variables
TrackingClient.connect()

experiment_name = "Jobs Test 4"
# Creates a new experiment if one doesn't exist
TrackingClient.set_experiment(experiment_name)

# Proceed with your training code...

2. Tracking with Multiple GPUs

When using torchrun on multi-GPU pods, each GPU spawns its own process. Each process will attempt to create a separate run in the tracking system.

Best Practice: Restrict the tracking client to the primary process only—i.e., when GLOBAL_RANK == 0.

if os.getenv("GLOBAL_RANK") == "0":
    # Only track in the primary process
    TrackingClient.connect()
    TrackingClient.set_experiment("My Experiment")

This avoids multiple redundant tracking runs from each GPU process.


Next Steps

  • Jobs Overview – Understand core concepts of job deployment.
  • Jobs UI – Learn how to visualize and manage jobs in the interface.
  • Tracking Overview – Dive deeper into how OICM tracks experiments, parameters, and artifacts.