Model Hub

The Model Hub provides a curated collection of pre-configured AI models that are onboarded to your OICM+ environment based on your specific use cases and hardware capacity. These models come with optimized configurations, allowing you to deploy production-ready AI models with minimal setup.

Quick Deployment from Model Hub

Browsing Available Models

Model template

The Model Hub displays all available pre-configured models in a card-based layout. Each model card shows:

Model name and description
Resource requirements (CPU, RAM, Storage)
Hardware specifications (GPU type and count)
Task type and serving framework
One-click Deploy button

One-Click Deployment

Model template

For immediate deployment with default settings:

Click the Deploy button on any model card
Review the deployment summary in the modal dialog:
Model description and capabilities
Serving framework (e.g., vLLM)
Task type (e.g., Text Generation)
Number of replicas (default: 1)
Resource requirements (CPU, RAM, Storage, Accelerator)
Click Deploy to launch the model with pre-configured settings

Prerequisites: Ensure your workspace has sufficient GPU resources allocated. Deployments will queue until GPU resources become available.

Once you click "Deploy" button, you will be directed to the deployment overview page.

Deployment overview

Custom Configuration Deployment

For advanced users who need to customize deployment settings, start by choosing your deployment source:

Deploy from Model Hub: Use pre-configured models with optional customization
Deploy from Model Version: Deploy from your model repository
Deploy from Docker Image: Use custom container images

Step-by-Step Configuration Process

When selecting Deploy from Model Hub, follow the multi-step configuration wizard:

Select model to configure

Step 1: Model Selection

Navigate to Deployments → Deploy a Model
Select Deploy from Model Hub from the deployment options
Browse and select your desired model from the available options
Click Next to proceed to configuration

Advanced Configuration

Step 2: Deployment Configuration

Serving Framework

Configure how the model will be served: - Model Server: Choose from available serving frameworks:

vLLM (recommended for most language models)
SGLang
TGI 3
Click Next to proceed to resource allocation

Step 3: Resource Allocation

Resources

Configure compute resources for your deployment:

Compute Type Selection:

GPU (recommended for most AI models)
Fractioned GPU (for smaller workloads)
CPU (for lightweight models)

Resource Configuration:

Memory (GiB): Available RAM allocation
Storage (GiB): Available storage quota
Accelerator: GPU type and specifications
Accelerator Count: Number of GPU units required

Important: The system displays available resource quotas to help you make informed allocation decisions.

Step 4: Scaling Configuration

Scaling configuration

Configure scaling behavior for your deployment:

Autoscaling Options:

Enable Autoscaling: Toggle to enable automatic replica scaling
Target Metric: Choose scaling metric (e.g., ml_model_concurrent_requests)
Scale Threshold: Set the threshold value for scaling decisions

Replica Configuration:

Min Replicas: Minimum number of running instances
Max Replicas: Maximum number of instances during peak load
Activation Threshold: Minimum load before scaling up

Step 5: Review and Deploy

Once scaling parameters are configured, you can proceed to the final step where you can review all the configured sections. At this stage, you have two deployment options:

Save: Store your configuration as a template for future use
Save & Deploy: Save the configuration and immediately launch the model deployment

Post-Deployment Management

Monitoring Active Deployments

Once deployed, your model appears in the Deployments section with comprehensive monitoring capabilities. Each deployment displays key information organized in sections:

Model Information:

Source: Model origin and identifier
Task Type: Inference task (e.g., Text Generation)
Serving Framework: Model serving framework (e.g., vLLM)

Resource Allocation:

GPU Type and Count: Accelerator specifications (e.g., h200 x1)
Memory (RAM): Allocated memory in GiB
Storage: Allocated storage in GiB

Instance Status:

Deployment Instances: Running pod replicas with their names and current status
Instance Count: Number of active replicas

Deployment Management Options

Depending on the current deployment status, the Undeploy button is available to:

Queued Deployments: Cancel pending deployments waiting for resources
Active Deployments: Terminate running deployments and clean up resources