Inference
This section explains how to perform inference on deployed machine learning models in the Open Innovation Platform. Depending on your model type, either LLM or classical ML, there are various methods to obtain predictions.
1. LLM Inference
Currently, the platform supports two primary LLM inference types:
- Text Generation
- Sequence Classification
1.1 Text Generation
Use TGI or VLLM deployments to access text generation endpoints. Two inference modes are supported:
1.1.1 Chat Inference
When using Chat mode, you provide a list of dictionaries representing conversation history:
{
"messages": [
{"role": "system", "content": "Be friendly"},
{"role": "user", "content": "What's the capital of UAE?"},
{"role": "assistant", "content": ""}
]
}
1.1.2 Completion Inference
When using Completion mode, you provide a single string:
Note: When you pass a list of dictionaries (Chat mode), the platform automatically formats the conversation according to a default or custom chat template. If you want full control over the prompt formatting, use Completion mode to pass a single string directly.
1.1.3 Inference Parameters
You can control additional parameters (e.g., temperature, top-k) alongside your input message. The exact parameters depend on whether you use VLLM or TGI as your inference backend.
1.1.4 Chat Templates
Chat mode relies on model-specific templates. The following families are supported by default:
- LLAMA 2
- LLAMA 3
- Falcon
- Yi
- Mistral
- Aya-23
If your model family isn’t in this list, add a custom chat template to the model version configuration or use the Completion mode for direct prompt strings.
1.2 Sequence Classification Inference
For sequence_classification models deployed with OI_SERVE, provide a single text input:
Request:
Response:
2. Classical ML Inference
2.1 Input Format
When deploying a tracked experiment or custom model, you can provide inputs in different formats:
Tip: Log your model signature to ensure inputs are parsed correctly:
2.1.1 Tensor Input (NumPy Arrays)
If the model expects a NumPy array, submit data as a JSON list. For a shape (-1, 3, 2):
2.1.2 Named Parameters (Pandas DataFrame)
If the model expects multiple columns (e.g., a DataFrame), use a list of objects:
2.2 Output Format
The endpoint returns a JSON dictionary with a predictions field:
List Output: If the model returns a Python list or NumPy array, you’ll get a list of arrays. Dictionary Output: If the model returns a Pandas DataFrame or dict, you’ll see key-value pairs. Example (List Output):
{
"predictions": [
[
-3.644273519515991,
-4.824134826660156,
-3.8084142208099365,
-5.363550662994385
],
[ -4.997870922088623,
-4.3103718757629395,
-0.13021154701709747,
-3.2400429248809814
]
]
}
Example (Dictionary Output):
{
"predictions": [
{"sentiment": "POSITIVE", "score": 0.976},
{"sentiment": "NEUTRAL", "score": 0.7345}
]
}
Next Steps
- Deployments UI – Manage and monitor your deployed models, including inference tests.
- Registered Models & Versions – Organize and version your ML models.
- Performance Benchmark – Evaluate how your model scales under load.