ML Implementations and Operations Table of Content1. SageMaker and Docker Containers 1.1. Using Docker 1.2. Amazon SageMaker Containers 1.3. Structure of a training container 1.4. Structure of a Deployment Container 1.5. Structure of your Docker image 1.6. Assembling it all in a Dockerfile 1.7. Environment variables 1.8. Using your own image 1.9. Production Variants 2. SageMaker On The Edge 2.1. SageMaker Neo 2.2. Neo + AWS IoT Greengrass 2.3. SageMaker Security 2.4. General AWS Security 2.5. Protecting your Data at Rest in SageMaker 2.6. Protecting Data in Transit in SageMaker 2.7. SageMaker + VPC 2.8. SageMaker + IAM 2.9. SageMaker Logging and Monitoring 3. Managing SageMaker Resources 3.1. Choosing your instance types 3.2. Managed Spot Training 3.3. Elastic Inference 3.4. Automatic Scaling 3.5. Serverless Inference 3.6. Amazon SageMaker Inference Recommender 3.7. SageMaker and Availability Zones 4. Inference Pipelines 5. More Resources
1. SageMaker and Docker ContainersAll models in SageMaker are hosted in Docker containersPre-built deep learningPre-built scikit-learn and Spark MLPre-built Tensorflow, MXNet, Chainer, PyTorch* Distributed training via Horovod or Parameter ServersYour own training and inference code! Or extend a pre-built image.This allows you to use any script or algorithm within SageMaker, regardless of runtime or languageContainers are isolated, and contain all dependencies and resources needed to run 1.1. Using DockerDocker containers are created from imagesImages are built from a DockerfileImages are saved in a repositoryAmazon Elastic Container Registry
1.2. Amazon SageMaker ContainersLibrary for making containers compatible with SageMakerRUN pip install sagemaker-containers in your Dockerfile 1.3. Structure of a training container
1.4. Structure of a Deployment Container
1.5. Structure of your Docker imageWORKDIRnginx.confpredictor.pyserve/train/wsgi.py 1.6. Assembling it all in a Dockerfile
1.7. Environment variablesSAGEMAKER_PROGRAMRun a script inside /opt/ml/codeSAGEMAKER_TRAINING_MODULESAGEMAKER_SERVICE_MODULESM_MODEL_DIRSM_CHANNELS / SM_CHANNEL_*SM_HPS / SM_HP_*SM_USER_ARGS…and many more 1.8. Using your own imagecd dockerfile!docker build -t foo .
from sagemaker.estimator import Estimator estimator = Estimator(image_name=‘foo', role='SageMakerRole', train_instance_count=1, train_instance_type='local') estimator.fit()
1.9. Production VariantsYou can test out multiple models on live traffic using Production VariantsVariant Weights tell SageMaker how to distribute traffic among them* So, you could roll out a new iteration of your model at say 10% variant weight* Once you’re confident in its performance, ramp it up to 100%This lets you do A/B tests, and to validate performance in real-world settingsOffline validation isn’t always useful
2. SageMaker On The Edge2.1. SageMaker NeoTrain once, run anywhereEdge devices* ARM, Intel, Nvidia processors* Embedded in whatever – your car?Optimizes code for specific devicesTensorflow, MXNet, PyTorch, ONNX, XGBoostConsists of a compiler and a runtime 2.2. Neo + AWS IoT GreengrassNeo-compiled models can be deployed to an HTTPS endpointHosted on C5, M5, M4, P3, or P2 instancesMust be same instance type used for compilationOR! You can deploy to IoT GreengrassThis is how you get the model to an actual edge deviceInference at the edge with local data, using model trained in the cloudUses Lambda inference applications
2.3. SageMaker Security2.4. General AWS SecurityUse Identity and Access Management (IAM)Set up user accounts with only the permissions they needUse MFAUse SSL/TLS when connecting to anythingUse CloudTrail to log API and user activityUse encryptionBe careful with PII 2.5. Protecting your Data at Rest in SageMakerAWS Key Management Service (KMS)Accepted by notebooks and all SageMaker jobs* Training, tuning, batch transform, endpoints* Notebooks and everything under /opt/ml/ and /tmp can be encrypted with a KMS keyS3Can use encrypted S3 buckets for training data and hosting modelsS3 can also use KMS 2.6. Protecting Data in Transit in SageMakerAll traffic supports TLS / SSLIAM roles are assigned to SageMaker to give it permissions to access resourcesInter-node training communication may be optionally encryptedCan increase training time and cost with deep learningAKA inter-container traffic encryptionEnabled via console or API when setting up a training or tuning job 2.7. SageMaker + VPCTraining jobs run in a Virtual Private Cloud (VPC)You can use a private VPC for even more securityYou’ll need to set up S3 VPC endpointsCustom endpoint policies and S3 bucket policies can keep this secureNotebooks are Internet-enabled by defaultThis can be a security holeIf disabled, your VPC needs an interface endpoint (PrivateLink) or NAT Gateway, and allow outbound connections, for training and hosting to workTraining and Inference Containers are also Internet-enabled by defaultNetwork isolation is an option, but this also prevents S3 access 2.8. SageMaker + IAMUser permissions for:CreateTrainingJobCreateModelCreateEndpointConfigCreateTransformJobCreateHyperParameterTuningJobCreateNotebookInstanceUpdateNotebookInstancePredefined policies:AmazonSageMakerReadOnlyAmazonSageMakerFullAccessAdministratorAccessDataScientist 2.9. SageMaker Logging and MonitoringCloudWatch can log, monitor and alarm on:Invocations and latency of endpointsHealth of instance nodes (CPU, memory, etc)Ground Truth (active workers, how much they are doing)CloudTrail records actions from users, roles, and services within SageMakerLog files delivered to S3 for auditing
3. Managing SageMaker Resources3.1. Choosing your instance typesWe covered this under “modeling”, even though it’s an operations concernIn general, algorithms that rely on deep learning will benefit from GPU instances (P2 or P3) for trainingInference is usually less demanding and you can often get away with compute instances there (C4, C5)GPU instances can be really pricey 3.2. Managed Spot TrainingCan use EC2 Spot instances for trainingSave up to 90% over on-demand instancesSpot instances can be interrupted!Use checkpoints to S3 so training can resumeCan increase training time as you need to wait for spot instances to become available 3.3. Elastic InferenceAccelerates deep learning inferenceAt fraction of cost of using a GPU instance for inferenceEI accelerators may be added alongside a CPU instanceml.eia1.medium / large / xlargeEI accelerators may also be applied to notebooksWorks with Tensorflow, PyTorch, and MXNet pre-built containersONNX may be used to export models to MXNetWorks with custom containers built with EIenabled Tensorflow, PyTorch, or MXNetWorks with Image Classification and Object Detection built-in algorithms 3.4. Automatic ScalingYou set up a scaling policy to define target metrics, min/max capacity, cooldown periodsWorks with CloudWatchDynamically adjusts number of instances for a production variantLoad test your configuration before using it! 3.5. Serverless InferenceIntroduced for 2022Specify your container, memory requirement, concurrency requirementsUnderlying capacity is automatically provisioned and scaledGood for infrequent or unpredictable traffic; will scale down to zero when there are no requests.Charged based on usageMonitor via CloudWatchModelSetupTime, Invocations, MemoryUtilization 3.6. Amazon SageMaker Inference RecommenderRecommends best instance type & configuration for your modelsAutomates load testing model tuningDeploys to optimal inference endpointHow it works:Register your model to the model registryBenchmark different endpoint configurationsCollect & visualize metrics to decide on instance typesExisting models from zoos may have benchmarks alreadyInstance RecommendationsRuns load tests on recommended instance typesTakes about 45 minutesEndpoint RecommendationsCustom load testYou specify instances, traffic patterns, latency requirements, throughput requirementsTakes about 2 hours
3.7. SageMaker and Availability ZonesSageMaker automatically attempts to distribute instances across availability zonesBut you need more than one instance for this to work!Deploy multiple instances for each production endpointConfigure VPC’s with at least two subnets, each in a different AZ
4. Inference PipelinesLinear sequence of 2-15 containersAny combination of pre-trained built-in algorithms or your own algorithms in Docker containersCombine pre-processing, predictions, post-processingSpark ML and scikit-learn containers OKSpark ML can be run with Glue or EMRSerialized into MLeap formatCan handle both real-time inference and batch transforms
5. More ResourcesSageMaker Developer GuideAmazon’s Exam GuideAmazon’s learning path