Skip to main content

vLLM Workload

vLLM Workload is a high-performance serving solution available on Microsoft Azure, designed for exceptional throughput and seamless deployment of AI models.

info

Google Cloud support is pending due to current configuration restrictions.

Model Configuration

When configuring your model, use the Hugging Face model identifier format: organization/model-name

Examples:

mistralai/Mistral-7B-v0.1

Available Models

A full list of supported models can be found on the Hugging Face Models.