vLLM Workload
vLLM Workload is a high-performance serving solution available on Microsoft Azure, designed for exceptional throughput and seamless deployment of AI models.
info
Google Cloud support is pending due to current configuration restrictions.
Model Configuration
When configuring your model, use the Hugging Face model identifier format:
organization/model-name
Examples:
mistralai/Mistral-7B-v0.1
Available Models
A full list of supported models can be found on the Hugging Face Models.