Abdullah Şamil Güser

6. SageMaker Service and SDK Changes

Model Training using Console (Lesson 63)

SageMaker and S3 Usage

Creating a Training Job

Managed Spot Training

SageMaker SDK Updates

The lecture provides a detailed overview of recent updates in SageMaker, highlighting its integration with S3 for data management, the process of setting up a training job, and the cost-saving potential of managed spot training. Additionally, the importance of checkpointing in training jobs, especially when using spot instances, is emphasized. The lecture also outlines the enhancements in the SageMaker SDK, particularly in terms of its expanded integration with various file systems and support for spot instance training.

Model Training using Python SDK (Lesson 64)

Overview

Training and Deployment Process

  1. Import Libraries: Standard libraries and SageMaker SDK.
  2. Configure S3: Set locations for model output, training, validation, and test data.
  3. Spot Instance Training: Controlled by a flag; significantly reduces training costs.
  4. Training Job Configuration:
    • Set max_runtime_seconds and max_wait_time_seconds.
    • Enable checkpointing for spot training.
    • Choose algorithm and specify version (e.g., XGBoost).
  5. Create Estimator:
    • Provide job configuration.
    • Set hyperparameters.
    • Specify S3 data locations.
    • fit method initiates training.
  6. Job Output: Includes total training time and billable seconds.
  7. Model Deployment: Deploy model for real-time inference, which may take up to 5 minutes.

Testing and Cleanup

  1. Testing: Use a separate notebook for testing and verifying endpoint performance.
  2. Endpoint Verification: Ensure the endpoint name matches in the SageMaker console.
  3. Batch Prediction: Handle large datasets by sending data in smaller batches.
  4. Cleanup: Terminate the endpoint to stop accruing charges.

Cost Saving Tips

  1. SageMaker Trial Period: Utilize the free trial for labs; turn off spot instances if on trial.
  2. Terminate Endpoints: Always delete endpoints after use.
  3. Batch Transform Jobs: Ideal for large datasets; handles deployment, prediction, and cleanup.
  4. Stop Notebook Instances: Stop when not in use to avoid charges.
  5. Billing Alerts: Set up billing and budget alerts to monitor charges.

Summary

This overview provides a comprehensive guide on using the SageMaker SDK for training, including configuration, deployment, testing, and cleanup, with a focus on cost-saving practices. It emphasizes the use of spot instances and the importance of managing resources effectively to optimize training costs.

Incremental Training (Lesson 65)

Incremental Training in SageMaker