Skip to content

User Interface

Deployments UI

Accessing Model Deployments

  1. Navigate to the Model Deployments section from the main platform menu
  2. The dashboard displays your existing deployments with their status
  3. Use the search functionality to find specific deployments

Model Deployments Dashboard

Creating a New Deployment

You can deploy either registered models from the Model Registry or custom Docker images.

Step 1: Model Selection

  1. Click the "Create New Deployment" button
  2. Choose your deployment source:
  3. Registered Model: Deploy a model from the Model Registry
  4. Custom Docker Image: Deploy your own containerized model

Deployment Source Selection

Deploying a Registered Model

When deploying from the Model Registry, follow these steps:

Step 1: Model Selection

  1. Enter a unique deployment name
  2. Select a registered model from the dropdown
  3. Select a specific model version from the dropdown
  4. Choose the appropriate task type for your model:
  5. Text Generation (LLMs)
  6. Text-to-Image
  7. Text-to-Speech
  8. Automatic Speech Recognition (ASR)
  9. Classical ML
  10. And others

Note: For guidance on task types, refer to the Hugging Face model categories which provides a comprehensive list of tasks and their purposes.

Model Selection

Step 2: Deployment Configuration

Select a model serving framework based on your requirements:

Note: Each framework has its own set of configuration options. Default configurations are provided for all frameworks, but you can customize them as needed.

Deployment Configuration

Step 3: Resource Allocation

Configure the computing resources for each deployment instance:

  • Compute Type:
    • Full GPUs
    • Fractional GPUs
    • CPUs
  • Memory (RAM): Amount of memory allocated to each instance
  • Storage: Disk space for model artifacts and runtime data
  • Accelerator Type: GPU model (if applicable)
  • Accelerator Count: Number of GPUs per instance
  • CPU Count: Number of CPU cores per instance

Important: Resources specified here are for a single deployment instance. The total resources consumed will be multiplied by the number of replicas configured in the next step.

Resource Allocation

Step 4: Scaling Configuration

  1. Toggle "Enable Autoscaling" on or off
  2. For fixed scaling (autoscaling disabled):
    • Set the number of replicas to maintain at all times
  3. For autoscaling (enabled):
    • Target Metric: The metric used to trigger scaling (default: ml_model_concurrent_requests)
    • Scale Threshold: The value of the target metric that triggers scaling
    • Min Replicas: Minimum number of instances to maintain regardless of load
    • Max Replicas: Maximum number of instances allowed during peak load
    • Activation Threshold: The threshold that must be exceeded to trigger a scaling event

Scaling Configuration

Autoscaling Example

Consider an LLM deployment with the following configuration:

- Target Metric: ml_model_concurrent_requests
- Scale Threshold: 5
- Min Replicas: 1
- Max Replicas: 10
- Activation Threshold: 6

In this scenario:

  1. The deployment starts with one replica
  2. When the number of concurrent requests exceeds 6, the platform triggers the scale-up
  3. New replicas are added until each instance handles approximately 5 concurrent requests
  4. During periods of low activity, replicas are gradually removed until reaching the minimum (1)
  5. The system maintains between 1 and 10 replicas depending on the load

This approach ensures efficient resource utilization while maintaining responsive service.

Step 5: Review and Deploy

The final step shows a summary of your deployment configuration:

  1. Review all settings
  2. Choose one of the deployment options:
  3. Save: Store the configuration without starting the deployment
  4. Save & Deploy: Create and immediately start the deployment

Deployment Review

Managing Deployments

Once created, you can manage your deployments through the Deployments dashboard:

  • Monitor the status of active deployments
  • Start, stop, or delete deployments
  • View performance metrics and logs
  • Update deployment configurations

Deployment Management

Best Practices

  • Resource Optimization: Start with modest resources and scale based on actual performance
  • Autoscaling: Configure appropriate thresholds to balance performance and cost
  • Monitoring: Regularly review deployment metrics to identify optimization opportunities