AWS SageMaker gets nine new capabilities

Wed, 9th Dec 2020

FYI, this story is more than a year old

Amazon Web Services (AWS) announced nine new capabilities for the Amazon SageMaker machine learning (ML) service during the ML keynote at the 2020 re:Invent conference, which was held virtually.

“Today, we are announcing a set of tools for Amazon SageMaker that makes it much easier for developers to build end-to-end machine learning pipelines to prepare, build, train, explain, inspect, monitor, debug, and run custom machine learning models with greater visibility, explainability, and automation at scale,” says AWS Amazon machine learning vice president Swami Sivasubramanian.

The new capabilities are:

Data Wrangler – for developers to prepare data for machine learning.

Data Wrangler contains over 300 built-in data transformers that can help customers normalise, transform, and combine features without having to write any code, while managing all of the processing infrastructure under the hood.

Feature Store – a purpose-built data store for storing, updating, retrieving, and sharing features.

Feature Store provides a purpose-built feature store where developers can access and share features that make it easier to name, organise, find, and share sets of features among teams of developers and data scientists. Residing in Amazon SageMaker Studio, close to where machine learning models are run, it provides single-digit millisecond latency for inference.

Pipelines – a purpose-built continuous integration and continuous delivery service

Pipelines logs each step in SageMaker Experiments every time a workflow is run.

This helps developers visualise and compare machine learning model iterations, training parameters, and outcomes. Workflows can be shared and re-used between teams.

Clarify – providing greater visibility into training data to limit bias

Clarify integrates with Data Wrangler where it runs a set of algorithms on features to identify bias during data preparation with visualisations that include a description of the sources and severity of possible bias. Clarify also integrates with Experiments to make it easier to check trained models for statistical bias, details how each feature inputted into the model is affecting predictions, and integrates with Model Monitor to alert developers if the importance of model features shifts and causes model behaviour to change.

Deep profiling for Debugger – monitors machine learning training performance

Deep profiling enables the profiling and monitoring of system resource utilisation in Studio. This expands Debugger's scope to monitor the utilisation of system resources, send out alerts on problems during training in Studio or via AWS CloudWatch, and correlate usage to different phases in the training job or a specific point in time during training. Debugger can also trigger actions based on alerts, works across frameworks (PyTorch, Apache MXNet, and TensorFlow) and collects necessary system and training metrics automatically without requiring any code changes in training scripts.

Distributed Training – can train large models up to two times faster

Distributed Training with SageMaker's Data Parallelism engine scales training jobs from one GPU to hundreds or thousands by automatically splitting data across multiple GPUs, improving training time by up to 40%. Data Parallelism's engine manages GPUs for optimal synchronisation.

Distributed Training with SageMaker's Model Parallelism engine can split large, complex models with billions of parameters across multiple GPUs by automatically profiling and identifying the best way to partition models. They do this by using graph partitioning algorithms to optimally balance computation and minimise communication between GPUs, resulting in minimal code changes and fewer errors caused by GPU memory constraints.

Edge Manager – machine learning model monitoring and management for edge devices

Edge Manager optimises models to run faster on target devices and provides model management for edge devices, so customers can prepare, run, monitor, and update deployed machine learning models across fleets of devices at the edge.

Customers can cryptographically sign their models, upload prediction data from their devices to SageMaker for monitoring and analysis, and view a dashboard that tracks and visually reports on the operation of the deployed models within the SageMaker console.

JumpStart – a developer portal for pre-trained models and pre-built workflows

JumpStart provides developers with a searchable interface to find best-in-class solutions, algorithms, and sample notebooks.