Azure ML
Azure ML is a platform used to build, train, and deploy machine learning models. Users can explore the types of models to deploy in the Model Catalog, which provides foundational and general purpose models from different providers.
This notebook goes over how to use an LLM hosted on an Azure ML Online Endpoint
.
##Installing the langchain packages needed to use the integration
%pip install -qU langchain-community
from langchain_community.llms.azureml_endpoint import AzureMLOnlineEndpoint
Set upโ
You must deploy a model on Azure ML or to Azure AI studio and obtain the following parameters:
endpoint_url
: The REST endpoint url provided by the endpoint.endpoint_api_type
: Useendpoint_type='dedicated'
when deploying models to Dedicated endpoints (hosted managed infrastructure). Useendpoint_type='serverless'
when deploying models using the Pay-as-you-go offering (model as a service).endpoint_api_key
: The API key provided by the endpoint.deployment_name
: (Optional) The deployment name of the model using the endpoint.
Content Formatterโ
The content_formatter
parameter is a handler class for transforming the request and response of an AzureML endpoint to match with required schema. Since there are a wide range of models in the model catalog, each of which may process data differently from one another, a ContentFormatterBase
class is provided to allow users to transform data to their liking. The following content formatters are provided:
GPT2ContentFormatter
: Formats request and response data for GPT2DollyContentFormatter
: Formats request and response data for the Dolly-v2HFContentFormatter
: Formats request and response data for text-generation Hugging Face modelsCustomOpenAIContentFormatter
: Formats request and response data for models like LLaMa2 that follow OpenAI API compatible scheme.
Note: OSSContentFormatter
is being deprecated and replaced with GPT2ContentFormatter
. The logic is the same but GPT2ContentFormatter
is a more suitable name. You can still continue to use OSSContentFormatter
as the changes are backwards compatible.
Examplesโ
Example: LlaMa 2 completions with real-time endpointsโ
from langchain_community.llms.azureml_endpoint import (
AzureMLEndpointApiType,
CustomOpenAIContentFormatter,
)
from langchain_core.messages import HumanMessage
llm = AzureMLOnlineEndpoint(
endpoint_url="https://<your-endpoint>.<your_region>.inference.ml.azure.com/score",
endpoint_api_type=AzureMLEndpointApiType.dedicated,
endpoint_api_key="my-api-key",
content_formatter=CustomOpenAIContentFormatter(),
model_kwargs={"temperature": 0.8, "max_new_tokens": 400},
)
response = llm.invoke("Write me a song about sparkling water:")
response
Model parameters can also be indicated during invocation:
response = llm.invoke("Write me a song about sparkling water:", temperature=0.5)
response
Example: Chat completions with pay-as-you-go deployments (model as a service)โ
from langchain_community.llms.azureml_endpoint import (
AzureMLEndpointApiType,
CustomOpenAIContentFormatter,
)
from langchain_core.messages import HumanMessage
llm = AzureMLOnlineEndpoint(
endpoint_url="https://<your-endpoint>.<your_region>.inference.ml.azure.com/v1/completions",
endpoint_api_type=AzureMLEndpointApiType.serverless,
endpoint_api_key="my-api-key",
content_formatter=CustomOpenAIContentFormatter(),
model_kwargs={"temperature": 0.8, "max_new_tokens": 400},
)
response = llm.invoke("Write me a song about sparkling water:")
response