Job Tracking
oi-tracking-client allows you to track metrics and artifacts for jobs in the OICM platform. By default, the environment automatically provides key variables like api_host
, api_key
, and workspace_name
, simplifying setup.
1. Using oi-tracking-client
1.1 Setup
The OICM platform integrates the oip-tracking-client with the Jobs module. The following environment variables are preconfigured:
- api_host – Points to the current environment’s API host.
- api_key – Handled internally; no need to supply credentials.
- workspace_name – Matches the workspace running the job.
You can override these if you prefer tracking in another environment or workspace.
Example
import os
from oip_tracking_client.tracking import TrackingClient
# Connect automatically using environment variables
TrackingClient.connect()
experiment_name = "Jobs Test 4"
# Creates a new experiment if one doesn't exist
TrackingClient.set_experiment(experiment_name)
# Proceed with your training code...
2. Tracking with Multiple GPUs
When using torchrun on multi-GPU pods, each GPU spawns its own process. Each process will attempt to create a separate run in the tracking system.
Best Practice: Restrict the tracking client to the primary process only—i.e., when GLOBAL_RANK == 0.
if os.getenv("GLOBAL_RANK") == "0":
# Only track in the primary process
TrackingClient.connect()
TrackingClient.set_experiment("My Experiment")
This avoids multiple redundant tracking runs from each GPU process.
Next Steps
- Jobs Overview – Understand core concepts of job deployment.
- Jobs UI – Learn how to visualize and manage jobs in the interface.
- Tracking Overview – Dive deeper into how OICM tracks experiments, parameters, and artifacts.