Welcome to OICM+ Documentation
OICM+ is a secure, flexible platform to orchestrate GPU infrastructure across multi-tenant, multi-cluster, and multi-cloud environments
Core Capabilities
- AI workload orchestration for training, inference, fine-tuning, and evaluation.
- GPU resource scheduling, scaling, and job lifecycle management.
- Role-Based Access Control (RBAC) across tenants, workspaces, and users.
- Data & network isolation.
- Integrated monitoring, logging, and usage tracking for cost and performance.
Infrastructure Flexibility
- Multi-Cluster & Multi-Cloud: Manage workloads across clusters and environments.
- Hardware-Agnostic: Compatible with NVIDIA, AMD (ROCm), and future accelerators.
Integration Highlights
- Identity systems: Keycloak, LDAP, Active Directory, SSO (OAuth2, SAML).
- Storage & Networking Integration: Support for on-prem and cloud-native components.
- Model repositories: Hugging Face, internal model stores.
- Billing system integration: Export GPU hours, token usage, API calls, storage, and job metrics for external billing engines.
- Monitoring and logging: Prometheus, Grafana, ELK/EFK stack, with stream integration.
- RESTful APIs & SDKs: Full control over all the features of the platform.