Abdullah Şamil Güser

Endpoint Changes with Zero Downtime

[Repeat] Endpoint Features, Monitoring and AutoScaling (Lesson 105)

Multi-Instance Configuration for Reliability

Auto Scaling Based on Workload

Hosting Multiple Model Variants

How to handle changes to production system? (Lesson 106)

  1. Single Model Endpoint
    • Structure: Single serving container with a model on an instance.
    • Fault Tolerance: Replication for reliability, load balancing, and auto-scaling based on traffic.
    • Limitation: Single point of failure; one container per instance.
  2. Multiple Model/Production Variants
    • Setup: New endpoint configuration with old and new models.
    • Traffic Distribution: Configure traffic weights (e.g., 70% old model, 30% new model).
    • Advantages: Zero downtime deployments, specific model variant targeting, separate auto-scaling rules.
    • Drawback: One serving container and model artifact per instance, leading to multiple server instances.
  3. Multimodal Endpoint
    • Usage: Host multiple models with a shared serving container (same algorithm).
    • Cost Efficiency: Reduced infrastructure costs, better utilization of endpoint instances.
  4. Multi-Container Endpoint
    • Functionality: Deploy multiple serving containers on a single endpoint.
    • Use Cases: Inference pipelines or direct invocation of different algorithm models.
    • Advantages: Host different algorithms on one instance, reduce infrastructure costs.
    • Limitations: Maximum of five containers co-hosted.

Lab - A/B Testing Multiple Production Variants (Lesson 107)

  1. Prepare & Train Models
  2. Create SageMaker Models
    • Use training jobs to create models.
  3. Create Endpoint Configuration
    • Then go to Inference -> Endpoint Configurations and create a new endpoint configuration.
    • Choose Create Production Variant and add the first version of the model.
      • Choose Edit.
      • You can update Variant Name as version-0-90-2
      • You can select Instance Type as ml.m5.large
      • Initial Instance Count and Initial Weight can both stay 1.
    • Repeat the same steps for the second version of the model, i.e. version-1-2-2
    • We’ve assigned equal initial weight (1) to each model for a 50/50 traffic distribution.
  4. Create Endpoint
    • Go to Inference -> Endpoints and create a new endpoint.
    • You can update Endpoint Name as xgboost-bikerental
    • Choose the endpoint configuration you created in the previous step.
  5. Endpoint Management - Adjust Weights
    • You can adjust weights if necessary from the endpoint configuration.
  6. Endpoint Management - Auto-Scaling
    • You can also configure auto-scaling based on the variant invocation per instance metric.
    • Select one of the variants and choose Configure Auto Scaling.
    • You can set the minimum and maximum instance count and the target value for the metric.
    • We need a trigger for auto scaling to respond to traffic.
    • The default metric that is used for trigger is the VariantInvocationsPerInstance metric.
    • This metric tracks the number of requests per minute, per instance.
  7. Invoke Endpoint for A/B Testing
    • Use the multiple_versions_prediction.ipynb notebook.
    • Set endpoint name, predictor instance, and optionally specify target variant.
    • Compare results for different versions.
  8. Analysis & Cleanup
    • Don’t forget to delete endpoint.

Lab – Multi-model Endpoint (Lesson 108)

  1. Prepare & Train Models
  2. Create SageMaker Model
    • In SageMaker console, select the first training job (with hyper1).
    • Choose “Use multiple models” option.
    • Set model artifact location to the model folder in S3. (s3://asamilg-sagemaker-mls-course/bikerental-hyper/model)
    • Create the model.
  3. Create Endpoint Configuration
    • Name: xboost-bikerental-hyper.
    • Select the created model.
    • Choose ml.m5.large instance type.
  4. Deploy Endpoint
    • Name: xgboost-bikerental-hyper.
    • Use the created endpoint configuration.
  5. Invoke Multimodal Endpoint
  6. Cleanup
    • Delete the endpoint to avoid ongoing charges.

Summary

Run Models at the Edge (Lesson 109)

SageMaker Neo

Benefits of SageMaker Neo

  1. Performance Improvement: Up to 2X increase in performance without sacrificing accuracy.
  2. Framework Size Reduction: Can achieve up to a 10X reduction. Neo compiles both the model and framework into a single executable for edge deployment.
  3. Cross-Platform Compatibility: Run the same ML model across multiple hardware platforms.

AWS IoT Greengrass