Skip to content

Model Hub

The Model Hub provides a curated collection of pre-configured AI models that are onboarded to your OICM+ environment based on your specific use cases and hardware capacity. These models come with optimized configurations, allowing you to deploy production-ready AI models with minimal setup.

Quick Deployment from Model Hub

Browsing Available Models

Model template​​

The Model Hub displays all available pre-configured models in a card-based layout. Each model card shows:

  • Model name and description
  • Resource requirements (CPU, RAM, Storage)
  • Hardware specifications (GPU type and count)
  • Task type and serving framework
  • One-click Deploy button

One-Click Deployment

Model template​​

For immediate deployment with default settings:

  1. Click the Deploy button on any model card
  2. Review the deployment summary in the modal dialog:
  3. Model description and capabilities
  4. Serving framework (e.g., vLLM)
  5. Task type (e.g., Text Generation)
  6. Number of replicas (default: 1)
  7. Resource requirements (CPU, RAM, Storage, Accelerator)
  8. Click Deploy to launch the model with pre-configured settings

Prerequisites: Ensure your workspace has sufficient GPU resources allocated. Deployments will queue until GPU resources become available.

Once you click "Deploy" button, you will be directed to the deployment overview page.

Deployment overview

Custom Configuration Deployment

For advanced users who need to customize deployment settings, start by choosing your deployment source:

  • Deploy from Model Hub: Use pre-configured models with optional customization
  • Deploy from Model Version: Deploy from your model repository
  • Deploy from Docker Image: Use custom container images

Step-by-Step Configuration Process

When selecting Deploy from Model Hub, follow the multi-step configuration wizard:

Select model to configure

Step 1: Model Selection

  1. Navigate to DeploymentsDeploy a Model
  2. Select Deploy from Model Hub from the deployment options
  3. Browse and select your desired model from the available options
  4. Click Next to proceed to configuration

Advanced Configuration

Step 2: Deployment Configuration

Serving Framework

Configure how the model will be served: - Model Server: Choose from available serving frameworks:

  • vLLM (recommended for most language models)
  • SGLang
  • TGI 3
  • Click Next to proceed to resource allocation

Step 3: Resource Allocation

Resources

Configure compute resources for your deployment:

Compute Type Selection:

  • GPU (recommended for most AI models)
  • Fractioned GPU (for smaller workloads)
  • CPU (for lightweight models)

Resource Configuration:

  • Memory (GiB): Available RAM allocation
  • Storage (GiB): Available storage quota
  • Accelerator: GPU type and specifications
  • Accelerator Count: Number of GPU units required

Important: The system displays available resource quotas to help you make informed allocation decisions.

Step 4: Scaling Configuration

Scaling configuration

Configure scaling behavior for your deployment:

Autoscaling Options:

  • Enable Autoscaling: Toggle to enable automatic replica scaling
  • Target Metric: Choose scaling metric (e.g., ml_model_concurrent_requests)
  • Scale Threshold: Set the threshold value for scaling decisions

Replica Configuration:

  • Min Replicas: Minimum number of running instances
  • Max Replicas: Maximum number of instances during peak load
  • Activation Threshold: Minimum load before scaling up

Step 5: Review and Deploy

Once scaling parameters are configured, you can proceed to the final step where you can review all the configured sections. At this stage, you have two deployment options:

  • Save: Store your configuration as a template for future use
  • Save & Deploy: Save the configuration and immediately launch the model deployment

Post-Deployment Management

Monitoring Active Deployments

Once deployed, your model appears in the Deployments section with comprehensive monitoring capabilities. Each deployment displays key information organized in sections:

Model Information:

  • Source: Model origin and identifier
  • Task Type: Inference task (e.g., Text Generation)
  • Serving Framework: Model serving framework (e.g., vLLM)

Resource Allocation:

  • GPU Type and Count: Accelerator specifications (e.g., h200 x1)
  • Memory (RAM): Allocated memory in GiB
  • Storage: Allocated storage in GiB

Instance Status:

  • Deployment Instances: Running pod replicas with their names and current status
  • Instance Count: Number of active replicas

Deployment Management Options

Depending on the current deployment status, the Undeploy button is available to:

  • Queued Deployments: Cancel pending deployments waiting for resources
  • Active Deployments: Terminate running deployments and clean up resources