Custom Docker Image Creation

This section shows how to create a custom Docker image that can handle various model endpoints on a linux/amd64 architecture. The service typically runs on port 8080, exposing endpoints for:

Classical ML models (e.g., linear regression, random forest).
LLMs for text generation (e.g., GPT-based).
Custom endpoints for specialized functions.

1. Service Overview

The Custom Image Creation Service listens on port 8080 and includes multiple endpoints to cover different model scenarios.

2. Endpoints

2.1 Health Check

Path: /health-check
Method: GET
Description: Confirms the model(s) are loaded and ready. Returns a JSON health status.
Response: JSON indicating the health status of each model with status code 200 if the model is healthy and 504 if the model is not healthy.

2.2 Inference Endpoint

All endpoints exposed within the Docker image are accessible as inference endpoints from the inference gateway.

Note: The custom endpoint does not have a dedicated UI tab

2.3 Classical ML Endpoint

Path: /v1/models/model:predict
Method: POST
Description: Accepts JSON input, returns ML model predictions.

2.4 Large Language Model (LLM) Endpoint

Path: /v1/chat/completions
Method: POST
Description: Generates text using an LLM based on a prompt in JSON.

2.5 Custom Prediction Endpoint

Path: <custom_endpoint>
Method: POST
Description: Supports custom predictions beyond classical ML or text generation.
Usage: Provide a JSON body that the endpoint can parse.

3. External Endpoint Structure

To call your model from outside the platform, use one of the following patterns:

/ml_client/<model_version_id>/<endpoint>

or for custom logic:

/ml_client/<model_version_id>/custom/<custom_endpoint>

The <model_version_id> ensures requests go to the correct model iteration.

3.1 LLM (Text Generation)

Path: /ml_client/<model_version_id>/v1/chat/completions
Usage: Send JSON with the user prompt, receive generated text.

3.2 Classical ML

Path: /ml_client/<model_version_id>/v1/models/model:predict
Usage: Submit feature data in JSON, obtain predicted values.

3.3 Custom Endpoint

Path: /ml_client/<model_version_id>/custom/<custom_endpoint>
Usage: Invoke specialized or experimental functionality with a JSON payload.

Note: Custom endpoints do not have a dedicated UI tab.

Example: Calling a Custom Endpoint in Python

import requests

base_url = "https://inference.develop.openinnovation.ai/models/<model_version_id>/proxy"
api_key = "api_key"
inference_endpoint = f"{base_url}/custom-endpoint"

payload = {}

headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
response = requests.post(inference_endpoint, json=payload, headers=headers)

4. Example Implementation

Below is a minimal FastAPI app showcasing a classical Linear Regression model on port 8080.

4.1 FastAPI Application (app.py)

from fastapi import FastAPI
from pydantic import BaseModel
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
import numpy as np

app = FastAPI()

# Load dataset & train model
db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target, test_size=0.2, random_state=42)
model = LinearRegression().fit(X_train, y_train)

class InputData(BaseModel):
    input: list

@app.post("/v1/models/model:predict")
async def predict(data: InputData):
    input_data = np.array(data.input).reshape(-1, X_train.shape[1])
    prediction = model.predict(input_data)
    return {"prediction": prediction.tolist()}

@app.get("/health-check")
async def health_check():
    return {"status": "healthy", "message": "Model is ready and healthy"}

4.2 Dockerfile

Note: All docker images can't run as root user, so we need to run the app as a non-root user (runner) with PID 10000

# Python 3.9 slim base image
FROM python:3.9-slim

# create a non-root user
RUN useradd --uid 10000 runner
RUN mkdir /app && chown runner /app

USER runner

# Working directory
WORKDIR /app

# Copy files
COPY . /app

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Expose port 8080
EXPOSE 8080

# Start server using Uvicorn
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

4.3 Requirements

Include all necessary libraries in requirements.txt:

fastapi
uvicorn
scikit-learn
numpy

Next Steps

Model Bundles – Learn how to package external models with custom code
Registered Models & Versions – Manage multiple versions of your models for different stages.
Deployments – Explore how to deploy and monitor your models on the OICM platform.