AWS Sagemaker Inference Endpoint
AWS SageMaker Inference Endpoint for hosting an AI model
Deployments
42
Made by
Massdriver
Official
Yes
No
Compliance
Clouds
Tags
aws-sagemaker-inference-endpoint
Amazon SageMaker Inference Endpoint is a fully managed service by AWS that allows developers to deploy machine learning models for making predictions in real-time. This bundle will take model data from S3 and an ECR Image such as Amazon’s Deep Learning Containers (DLC) , create an endpoint configuration, and then create an endpoint.
Use Cases
Real-time Predictions
SageMaker Inference Endpoints make it easy to generate real-time predictions from your machine learning models.
Scalable Model Deployment
You can deploy your models with auto scaling capabilities to handle varying loads of inference requests.
Integrated with your Applications
SageMaker Inference Endpoints are exposed through a secure REST API, which can be easily integrated into your applications.
Design
This bundle accepts a SageMaker Model name as input and creates an Inference Endpoint. The model must be in the SageMaker Model Registry before it can be used to create an endpoint. The endpoint can be deployed to a variety of instance types, including CPU and GPU instances and you can set it’s initial instance count. The endpoint must be created in the same region as the model.
SageMaker Model
A SageMaker Model is a representation of a machine learning model. It includes the S3 path where the model artifacts are stored and the Docker image that was used for training.
Endpoint Configuration
An Endpoint Configuration specifies the ML compute instances that will be used for the inference endpoint.
Inference Endpoint
The Inference Endpoint is a hosted model that can be accessed through a REST API to get real-time predictions.
Variable | Type | Description |
---|---|---|
endpoint_config.instance_count | integer | Initial number of instances used for auto-scaling. |
endpoint_config.instance_type | string | Instance type to use for the SageMaker endpoint |
endpoint_config.primary_container.ecr_image | string | The ECR Image URI. (e.g. 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1.0-gpu-py310-cu118-ubuntu20.04-ec2) |
endpoint_config.primary_container.model_data_config.enabled | boolean | Enabling this option will allow you to include model data for the SageMaker model. |
environment_variables[].name | string | No description |
environment_variables[].value | string | No description |
monitoring.endpoint_log_retention | integer | No description |