[PYTHON] For beginners of SageMaker --Collection of material links -

SageMaker, AWS's machine learning management service, doesn't come up with the materials I want in a pinpoint manner, so I'll summarize them.

Features of Amazon SageMaker

When operating an ML system in a production environment, the managed function shows its true value. A component for realizing MLops and MDLC. PoC model development alone does not provide real benefits.

Official reference

Machine Learning Lens, AWS Well-Architected Framework https://d1.awsstatic.com/whitepapers/architecture/wellarchitected-Machine-Learning-Lens.pdf Guidelines for ML systems built on AWS

Developer guide, SDK reference, etc. https://docs.aws.amazon.com/ja_jp/sagemaker/index.html

GitHub https://github.com/awslabs/amazon-sagemaker-examples/blob/master/README.md

GitHub (Japanese) https://github.com/aws-samples/amazon-sagemaker-examples-jp

Amazon SageMaker Discussion Forums https://forums.aws.amazon.com/forum.jspa?forumID=285

Documents by AWS service https://aws.amazon.com/jp/aws-jp-introduction/aws-jp-webinar-service-cut/#ai-wn

MDLC : Model Development Life Cycle "Scaleable model deployment and automation on Amazon.com" https://aws.amazon.com/jp/blogs/news/aws-aiml-tokyo2/

In this session, we introduced the concept of model development life cycle (MDLC) of machine learning that customers should consider when introducing machine learning, and the workflow for executing MDLC. This MDLC is actually the method adopted by Amazon Consumer Payments and was also introduced in the session at re: Invent 2019.

Overview of SageMaker features

Amazon SageMaker Basics https://pages.awscloud.com/rs/112-TZM-766/images/20191128_Amazon%20SageMaker_Basic.pdf

[AWS Black Belt Online Seminar] Amazon SageMaker Advanced Session materials and QA released https://aws.amazon.com/jp/blogs/news/webinar-bb-amazon-sagemaker-advanced-session-2019/?nc1=f_ls

Get the big picture of machine learning with SageMaker and xGBoost https://qiita.com/suzukihi724/items/3792f395fb22cf7fb311

Data passing type to container

How to throw when using the AWS SageMaker built-in method https://qiita.com/kazuhisa-nagashima/items/21d80271da733d15f4b0

How to pass environment variables from the host OS side to the guest OS (container) with Docker (various types) https://qiita.com/KEINOS/items/518610bc2fdf5999acf2 I was wondering why environment variables are used to pass parameters, but it seems that the theory is to use environment variables to convey information to the container.

Learn / infer with your own container

I made a container image that can use LightGBM with Amazon SageMaker https://dev.classmethod.jp/articles/sagemaker-container-image-lightgbm/

Create your own learning / inference container image with Amazon SageMaker https://dev.classmethod.jp/articles/sagemaker-container-image-custom/

GitHub (Japanese) https://github.com/aws-samples/amazon-sagemaker-examples-jp/tree/master/workshop/lab_bring-your-own-containers

[Developer's Guide] Using your own inference code in your hosting service https://docs.aws.amazon.com/ja_jp/sagemaker/latest/dg/your-algorithms-inference-code.html

Inference: When implemented with API

Application development with Amazon SageMaker inference endpoints https://www.slideshare.net/AmazonWebServicesJapan/amazon-sagemaker-122749918

Build a serverless front end for Amazon SageMaker endpoints https://aws.amazon.com/jp/blogs/news/build-a-serverless-frontend-for-an-amazon-sagemaker-endpoint/

Build, test, and deploy Amazon Sagemaker inference models to AWS Lambda https://aws.amazon.com/jp/blogs/news/build-test-and-deploy-your-amazon-sagemaker-inference-models-to-aws-lambda/

log

20190206 AWS Black Belt Online Seminar Amazon SageMaker Basic Session https://www.slideshare.net/AmazonWebServicesJapan/20190206-aws-black-belt-online-seminar-amazon-sagemaker-basic-session-130777850 Described in the reference at the end

Security

ML Security on AWS https://pages.awscloud.com/rs/112-TZM-766/images/A2-07.pdf

How to solve the "Don't go out to the Internet" requirement on AWS-Private Link support- https://bcblog.sios.jp/aws-privatelink/

Explain how to use AWS Private Link https://blog.mmmcorp.co.jp/blog/2017/11/15/aws_privatelink/

How to use AWS Private Link and precautions-Use properly with VPC peering- https://devlog.arksystems.co.jp/2018/05/11/4896/

Experience Hands-on / QA

https://aws.amazon.com/jp/blogs/news/amazon-sagemaker-handson-20190517/

Design and operation

Machine Learning Model Development Flow on AWS SageMaker (PyTorch) https://qiita.com/noko_qii/items/41130f66afbb8e451f23

Pre-processing: SageMaker Processing

I tried using my own container with Amazon SageMaker Processing https://dev.classmethod.jp/articles/amazon-sagemaker-processing-original-container/ Almost the same as SKLearnProcessor except that a container with the necessary libraries installed is prepared.

Use built-in container: SKLearnProcessor https://docs.aws.amazon.com/ja_jp/sagemaker/latest/dg/use-scikit-learn-processing-container.html

Use your own container: ScriptProcessor

Amazon SageMaker Processing – Fully Managed Data Processing and Model Evaluation https://aws.amazon.com/jp/blogs/news/amazon-sagemaker-processin-fully-managed-data-processing-and-model-evaluation/

Pipeline construction

CI / CD pipeline for ML with Amazon SageMaker https://pages.awscloud.com/rs/112-TZM-766/images/E-3.pdf

Batch inference

Real-time reasoning

How to combine your own containers in Amazon SageMaker's inference pipeline https://qiita.com/yaiwase/items/79f99d2c38ed66729a47

AWS Step Functions AWS Step Functions https://docs.aws.amazon.com/ja_jp/step-functions/latest/dg/connect-sagemaker.html

Pattern 1: StepFunctions-> SageMaker API (Available with the SageMaker API installed in Setp Functions)

Create a Step Functions API using API Gateway https://docs.aws.amazon.com/ja_jp/step-functions/latest/dg/tutorial-api-gateway.html If you want to accept from API Gateway.

AWS Step Functions Data Science SDK for Python https://docs.amazonaws.cn/en_us/step-functions/latest/dg/concepts-python-sdk.html

Introducing "AWS Step Functions Data Science SDK" to easily create a state machine https://dev.classmethod.jp/articles/yoshim_step_functions_datascience_sdk/

The API of SageMaker that can be handled by Step Functions has increased https://dev.classmethod.jp/articles/stepfunctions-sagemaker-api-update/

Pattern 2: StepFunctions-> Lambda-> SageMaker API Script execution using your own processing container https://docs.aws.amazon.com/ja_jp/sagemaker/latest/dg/processing-container-run-scripts.html

Manage ML workflows serverless with AWS Step Functions: Amazon SageMaker Advent Calendar 2018 https://dev.classmethod.jp/articles/2018advent-calendar-sagemaker-20181210/

Manage machine learning workflows with AWS StepFunctions https://qiita.com/kurakura0916/items/5e89cb86e86d22fdc5d8

Create a batch process that combines Lambda with AWS Step Functions https://dev.classmethod.jp/articles/aws-step-functions-batch-service/

[GitHub] AWS Step Functions Data Science SDK - Hello World https://github.com/awslabs/amazon-sagemaker-examples/blob/master/step-functions-data-science-sdk/hello_world_workflow/hello_world_workflow.ipynb

Automating Amazon SageMaker workflows with AWS Step Functions https://www.youtube.com/watch?v=0kMdOi69tjQ

Orchestrate Machine Learning Workflows with Amazon SageMaker and AWS Step Functions https://www.youtube.com/watch?v=dNb5jVffzPs

20190522 AWS Black Belt Online Seminar AWS Step Functions https://www.slideshare.net/AmazonWebServicesJapan/20190522-aws-black-belt-online-seminar-aws-step-functions

Building an AWS Serverless ML Pipeline with Step Functions https://tech.olx.com/building-an-aws-serverless-ml-pipeline-with-step-functions-b39feed12bab

I tried looping and branching for the first time using Step Functions! https://dev.classmethod.jp/articles/first-aws-step-functions/ Step Functions + Lambda hands-on

AWS Lambda Build a CI / CD pipeline at once using AWS Lambda application creation https://qiita.com/shonansurvivors/items/b223fbb362aed3c1c536

AWS GLUE AWS GLUE concept https://docs.aws.amazon.com/ja_jp/glue/latest/dg/components-key-concepts.html

How to use Glue https://qiita.com/pioho07/items/32f76a16cbf49f9f712f

What I did to handle 5TB / day data with AWS Glue (Overview) https://future-architect.github.io/articles/20180828/ The comparison between EMR and Glue is very helpful.

MLOps Automate building machine learning models with Lambda and lowering makers https://medium.com/@yuyasugano/lambda%E3%81%A8%E4%B8%8B%E3%81%92%E3%83%A1%E3%83%BC%E3%82%AB%E3%83%BC%E3%81%A7%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92%E3%83%A2%E3%83%87%E3%83%AB%E3%81%AE%E6%A7%8B%E7%AF%89%E3%82%92%E8%87%AA%E5%8B%95%E5%8C%96-73161d316c0e

An event-driven machine learning platform that orchestrate with Step functions and aws batch https://www.slideshare.net/yuyamada777/step-functionsaws-batch

Articles that I referred to when studying around Machine Learning Platform / MLOps (as of December 2018) https://qiita.com/noko_qii/items/f31901817dbed86f2b25

Hyperparameter tuning

Mechanism of hyperparameter adjustment https://docs.aws.amazon.com/ja_jp/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html

Understanding the Optuna (TPE) Algorithm-Part 1- https://qiita.com/nabenabe0928/items/708d221dbccebf31f01c

GitHub (Japanese) https://github.com/aws-samples/amazon-sagemaker-examples-jp/blob/master/hpo_pytorch_mnist/pytorch_mnist.ipynb

Optuna hyperparameter optimization framework https://www.slideshare.net/pfi/pydatatokyo-meetup-21-optuna Define-by-Run: Define the search space when the objective function is executed

Use Optuna with SageMaker https://aws.amazon.com/jp/blogs/machine-learning/implementing-hyperparameter-optimization-with-optuna-on-amazon-sagemaker/

SageMaker HPO: First, the search space is decided. Determine the parameters in Optuna: for loop. Dynamically determine the search space ・ Multiple trials run in one training job. ・ After one trial, write the parameters to the aurora. Look at the past parameters and decide the next parameter. pytorch-simple https://github.com/aws-samples/amazon-sagemaker-optuna-hpo-blog/blob/master/examples/pytorch_simple/src/pytorch_simple.py

・ Objective (trial) Trials, experiments, etc. -Tuning the number of layer units. Tuning the number of layers 46th line n_layers = trial.suggest_int("n_layers", 1, 3) -Optuna has an RDS back end, so RDS and Aulora are needed. -The number of units is tuned on the 51st line. out_features = trial.suggest_int("n_units_l{}".format(i), 4, 128)

AutoML Can't you automate that machine learning process? https://qiita.com/Hironsan/items/30fe09c85da8a28ebd63

Visualization: Amazon QuickSight

What is Amazon QuickSight? https://docs.aws.amazon.com/ja_jp/quicksight/latest/user/welcome.html

AWS Manga Episode 8: Visualize All Data! (1/8) https://aws.amazon.com/jp/campaigns/manga/vol8-1/ There is hands-on material at the bottom

10 Visualizations to Try with Amazon QuickSight Using Sample Data https://aws.amazon.com/jp/blogs/news/10-visualizations-to-try-in-amazon-quicksight-with-sample-data/

[Official] Use of ML Insight https://docs.aws.amazon.com/ja_jp/quicksight/latest/user/making-data-driven-decisions-with-ml-in-quicksight.html Random cut forest is used for both anomaly detection and numerical prediction.

Visualizing Amazon SageMaker machine learning predictions with Amazon QuickSight https://aws.amazon.com/jp/blogs/machine-learning/making-machine-learning-predictions-in-amazon-quicksight-and-amazon-sagemaker/ Cooperation between SageMaker and QuickSight

I want to move an existing R code

Use a custom container Use your own algorithms and models with Amazon SageMaker https://docs.aws.amazon.com/ja_jp/sagemaker/latest/dg/your-algorithms.html

GitHub https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/r_bring_your_own

For Kubernetes / Kubeflow users

Introducing Amazon SageMaker Operators for Kubernetes https://aws.amazon.com/jp/blogs/news/introducing-amazon-sagemaker-operators-for-kubernetes/ Can be used from EKS. It can also be used with plain Kubernetes.

Amazon SageMaker Components for Kubeflow Pipelines https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-sagemaker-components-kubeflow-pipelines/ SageMaker can be incorporated into Kubeflow

Distributed processing

Distributed learning and workflow for large-scale data realized by Amazon SageMaker | AWS Summit Tokyo 2019 https://www.youtube.com/watch?v=NUnIiYD-PEU&t=714s AWS CodeCommit EFS: Data sharing between instances / users File access / pipe mode Parameters for large data of 100GB HPO Amazon Elastic Inference Batch conversion Data Preparation Best Practices (Glue / Athena) Pack the dataset into a RecordIO / TFRecord file Use batch conversion if you don't need online forecasting Use inference pipeline instead of multiple endpoints

Reinforcement learning

"Common problems in actual machine learning operations and solutions / case studies using AWS" | AWS Summit Tokyo 2019 https://www.youtube.com/watch?v=a3smIzBC6BQ

How to get started with machine learning

[Beginner] Introduction to AWS Machine Learning Services | AWS Summit Tokyo 2019 https://www.youtube.com/watch?v=1gC46ODyudE

Container (ECS / EKS / ECR / Fargate)

https://www.youtube.com/watch?v=L4bLDNRSYC8

SageMaker overview

【AWS Black Belt Online Seminar】Amazon SageMaker Advanced Session https://www.youtube.com/watch?v=G-s67PmTCjo&t=2496s

AI service

【AWS Black Belt Online Seminar】AWS AI Language Services https://www.youtube.com/watch?v=Q0Ety9Z7oWM

【AWS Black Belt Online Seminar】AWS AI Services https://www.youtube.com/watch?v=xvUyKjuv-Z4&t=1183s Rekognition inference architecture available

Amazon API Gateway 【AWS Black Belt Online Seminar】Amazon API Gateway https://www.youtube.com/watch?v=EpEETIox03s&list=RDCMUCnjKWUK2t5QJYfeqqilhJhQ&index=8

Recommended Posts

For beginners of SageMaker --Collection of material links -
Overview of Docker (for beginners)
[Python] Minutes of study meeting for beginners (7/15)
Roadmap for beginners
Easy understanding of Python for & arrays (for super beginners)
List of links that machine learning beginners are learning
Basic principles of image recognition technology (for beginners)
Basics of pandas for beginners ② Understanding data overview
Basic story of inheritance in Python (for beginners)
[Linux] Basics of authority setting by chmod for beginners
[For beginners] Django Frequently used commands and reference collection
Spacemacs settings (for beginners)
python textbook for beginners
Dijkstra algorithm for beginners
OpenCV for Python beginners
[For beginners] Basics of Python explained by Java Gold Part 2
[For beginners] How to read Numerai's HP + Submit + Convenient links
[For beginners] Summary of standard input in Python (with explanation)
■ Kaggle Practice for Beginners --Introduction of Python --by Google Colaboratory
[Django] A collection of scripts that are convenient for development
Pandas basics for beginners ④ Handling of date and time items
[Python] The biggest weakness / disadvantage of Google Colaboratory [For beginners]
[For beginners] A word summary of popular programming languages (2018 version)