Tracking API Client

The Tracking API helps data scientists and engineers log experiments, package code, and share models. It’s built on top of MLflow, offering extra features for easy reproducibility and collaboration.

1. Installation

Install the client via pip:

pip install oip-tracking-client

2. Library Import

Add the following import to your Python code:

from oip_tracking_client.tracking import TrackingClient

3. Initialization

Connect to the tracking server with:

api_host = "<API_HOST>"
api_key = "<API_KEY>"
workspace_name = "<TARGET_WORKSPACE_NAME>"
TrackingClient.connect(api_host, api_key, workspace_name)

api_host (str, required) – Hostname of the API server.
api_key (str, required) – Your API key.
workspace_name (str, required) – Name of the target workspace.

4. Experiment Setting

After connecting, specify the experiment:

experiment_name = "<TARGET_EXPERIMENT_NAME>"
TrackingClient.set_experiment(experiment_name)

experiment_name (str, required) – Name of the experiment; created if it doesn’t exist.

For more info, see Experiment UI.

5. MLflow Compatibility

The TrackingClient extends MLflow’s functionality. All MLflow methods are accessible via TrackingClient.*. For details, refer to the Official MLflow Documentation.

Example: TrackingClient.autolog() is equivalent to mlflow.autolog().

6. Tracking Experiments

Enable Auto-tracking to log parameters, metrics, and models automatically:

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

api_host = "<API_HOST>"
api_key = "<API_KEY>"
workspace_name = "<TARGET_WORKSPACE_NAME>"
experiment_name = "<EXPERIMENT_NAME>"

TrackingClient.connect(api_host, api_key, workspace_name)
TrackingClient.set_experiment(experiment_name)

with TrackingClient.start_run():
    TrackingClient.set_run_name("YOUR_RUN_NAME")
    TrackingClient.autolog()

    db = load_diabetes()
    X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

    rf = RandomForestRegressor(n_estimators=10, max_depth=6, max_features=3)
    rf.fit(X_train, y_train)

predictions = rf.predict(X_test)

7. Manual Model Logging

ّIf you prefer finer control, you can use manual logging. Automated logging offers convenience, but manual logging provides a level of control that might be necessary in specific scenarios.

The manual logging can be performed utilizing:

TrackingClient.<name_of_the_library>.log_model

Our method is based on mlflow.<name_of_the_library>.log_model, ensuring that TrackingClient.<name_of_the_library>.log_model behaves in the same manner.

The following are the arguments of log_model:

1st Argument: The model.
2nd Argument: The artifact path, which should be set to "model".
Signature: A named argument that represents the model's input and output schemas.

To infer the model's signature using samples of its input and output data, use:

signature = TrackingClient.infer_signature(x_train, y_train)

Supported Libraries

Below is a list of the log_model methods available for various libraries:

TrackingClient.catboost.log_model
TrackingClient.diviner.log_model
TrackingClient.fastai.log_model
TrackingClient.gluon.log_model
TrackingClient.h2o.log_model
TrackingClient.johnsnowlabs.log_model
TrackingClient.langchain.log_model
TrackingClient.lightgbm.log_model
TrackingClient.mleap.log_model
TrackingClient.onnx.log_model
TrackingClient.openai.log_model
TrackingClient.paddle.log_model
TrackingClient.pmdarima.log_model
TrackingClient.prophet.log_model
TrackingClient.pyfunc.log_model
TrackingClient.pytorch.log_model
TrackingClient.sentence_transformers.log_model
TrackingClient.sklearn.log_model
TrackingClient.spacy.log_model
TrackingClient.spark.log_model
TrackingClient.statsmodels.log_model
TrackingClient.tensorflow.log_model
TrackingClient.transformers.log_model
TrackingClient.xgboost.log_model

For manually logging a scikit-learn model:

signature = TrackingClient.infer_signature(x_train, y_train)
TrackingClient.sklearn.log_model(model, "model", signature=signature)

8. Model Artifacts

Each run generates artifacts, metrics, parameters, and tags:

--run_id
  --artifacts
    --model
      --MLmodel
      --model.pkl
  --metrics
  --params
  --tags

Model Files – Serialized models (model.pkl, .h5, etc.) and metadata (MLmodel).
Metrics & Params – Logged performance data and hyperparameters.
Tags – Extra metadata for categorization.

Example MLmodel file:

artifact_path: model
flavors:
  python_function:
    env:
    conda: conda.yaml
    virtualenv: python_env.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    predict_fn: predict
    python_version: 3.9.7
sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 1.2.2
mlflow_version: 2.2.1
model_uuid: cd3b0852791d48a2a67ce739b0b07070
run_id: fb1aeba6036345cca027d44d747d1aeb
signature:
  inputs: '[{"type": "tensor", "tensor-spec": {"dtype": "float64", "shape": [-1, 10]}}]'
  outputs: '[{"type": "tensor", "tensor-spec": {"dtype": "float64", "shape": [-1]}}]'
utc_time_created: '2023-07-20 09:16:16.898078'

9. Logging Artifacts

Beyond the standard MLflow features, the TrackingClient also supports logging additional artifact types for deeper analysis.

9.1 Image Tracking

from PIL import Image

image_data = Image.open("test_image.jpg")
extra = {"label": "car"}
TrackingClient.log_image_at_step(image_data, 'image_file.jpg', 1, extra)

image_data – A numpy.ndarray or PIL.Image.Image.
extra – An optional dict for metadata.

9.2 Audio Tracking

import numpy as np

audio_data = np.random.random(1000)  # Example
TrackingClient.log_audio_at_step(audio_data, 'audio_file.wav', 1, rate=44100)

rate – Sample rate of the audio.
audio_data – A 1D or 2D numpy array (for stereo).

9.3 Text Tracking

text_data = "This is a sample text."
TrackingClient.log_text_at_step(text_data, 'text_file.txt', 1)

9.4 Figure Tracking

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4], [1, 4, 2, 3])
TrackingClient.log_figure_at_step(fig, 'figure_file.jpg', 1)

fig – A matplotlib.figure.Figure or plotly.graph_objects.Figure.

9.5 JSON Tracking

dict_data = {"key1": "value1", "key2": "value2"}
TrackingClient.log_dict_at_step(dict_data, 'dict_file.json', 1)

Saves a dictionary or list as JSON.

Extra Parameters

All log_*_at_step methods can include:

file_name (str) – Name of the artifact file.
step (int) – Numeric step index.
extra (dict) – Additional metadata (keys as strings, values as int/float/str/bool/list/None).

extra = {"description": "This is a description."}
TrackingClient.log_audio_at_step(audio_data, 'audio_file.wav', 1, rate=44100, extra=extra)

Next Steps

Run UI – See how your logged experiments appear in the platform’s UI.
Experiment UI – Explore how experiments are organized and compared.
Tracking Overview – Learn the fundamentals of OICM’s tracking server.
Workspace Overview – Discover how workspaces facilitate isolation and security for ML workflows.