I don't have an R (other Julia, TF latest version, etc.) kernel in Amazon SageMaker Studio! Good news for those who are mourning. In this article, I'll show you how to customize your Amazon SageMaker Studio notebook environment. Amazon SageMaker uses a Docker image to build the environment, so if you set your own Docker image in Amazon SageMaker Studio, you can use that image in your notebook.
For details on how to set up Amazon SageMaker Studio, refer to the here article.
Now let's create a Docker image. This time, I will borrow the sample of here and make an image for using Tensorflow 2.3.
This time, we will use AWS Cloud9 as the Docker image creation environment. Of course, if you have Docker and AWS CLI environment built on your PC, you can use it. Go from the AWS console to the Cloud9 console, click Create environment, and follow these steps to create your environment.
A new browser tab will be created, and after a minute or two you will see the AWS Cloud9 screen.
Execute the command in the part surrounded by the red frame in the image below. The area is small at first, but you can change the size by dragging the frame with the mouse.
First of all, increase the storage size of Cloud9 as you may run out of space when creating a Docker image. Save the following script in / home/ec2-user/environment
with a name such as resize.sh
and run it as sh resize.sh 30
. The number in the argument specifies how many GB the storage should be expanded to. The script was borrowed from here.
resize.sh
#!/bin/bash
# Specify the desired volume size in GiB as a command-line argument. If not specified, default to 20 GiB.
SIZE=${1:-20}
# Get the ID of the environment host Amazon EC2 instance.
INSTANCEID=$(curl http://169.254.169.254/latest/meta-data/instance-id)
# Get the ID of the Amazon EBS volume associated with the instance.
VOLUMEID=$(aws ec2 describe-instances \
--instance-id $INSTANCEID \
--query "Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.VolumeId" \
--output text)
# Resize the EBS volume.
aws ec2 modify-volume --volume-id $VOLUMEID --size $SIZE
# Wait for the resize to finish.
while [ \
"$(aws ec2 describe-volumes-modifications \
--volume-id $VOLUMEID \
--filters Name=modification-state,Values="optimizing","completed" \
--query "length(VolumesModifications)"\
--output text)" != "1" ]; do
sleep 1
done
#Check if we're on an NVMe filesystem
if [ $(readlink -f /dev/xvda) = "/dev/xvda" ]
then
# Rewrite the partition table so that the partition takes up all the space that it can.
sudo growpart /dev/xvda 1
# Expand the size of the file system.
# Check if we are on AL2
STR=$(cat /etc/os-release)
SUB="VERSION_ID=\"2\""
if [[ "$STR" == *"$SUB"* ]]
then
sudo xfs_growfs -d /
else
sudo resize2fs /dev/xvda1
fi
else
# Rewrite the partition table so that the partition takes up all the space that it can.
sudo growpart /dev/nvme0n1 1
# Expand the size of the file system.
# Check if we're on AL2
STR=$(cat /etc/os-release)
SUB="VERSION_ID=\"2\""
if [[ "$STR" == *"$SUB"* ]]
then
sudo xfs_growfs -d /
else
sudo resize2fs /dev/nvme0n1p1
fi
fi
Here we will create an Amazon ECR repository for pushing Docker images. After rewriting REGION
with the region name you want to use , execute the following command in Cloud9.
REGION=<aws-region>
aws --region ${REGION} ecr create-repository \
--repository-name smstudio-custom
From here, I will create an image. First, clone the sample code from the GitHub repository. Run the following command in Cloud9.
git clone https://github.com/aws-samples/sagemaker-studio-custom-image-samples.git
This time we will use a sample to create Tensorflow 2.3, so change the current directory with the following command.
cd sagemaker-studio-custom-image-samples/examples/tf23-image
Next, build the Docker image. Rewrite ACCOUNT_ID
with your own her AWS account ID , then run the following command: This time, the Cloud9 instance type is the default t2.micro, so it takes a lot of time to build a large image. This image build took about 1-2 minutes. If you want to build a large image, choose a higher-spec instance type.
# Modify these as required. The Docker registry endpoint can be tuned based on your current region from https://docs.aws.amazon.com/general/latest/gr/ecr.html#ecr-docker-endpoints
ACCOUNT_ID=<account-id>
# Build the image
IMAGE_NAME=custom-tf23
aws --region ${REGION} ecr get-login-password | docker login --username AWS --password-stdin ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/smstudio-custom
docker build . -t ${IMAGE_NAME} -t ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/smstudio-custom:${IMAGE_NAME}
See here (https://github.com/aws-samples/sagemaker-studio-custom-image-samples/blob/main/DEVELOPMENT.md) for how to test the built image locally. It also describes how to check the UID and GIU to be used in later settings. Then run the following command to push the built image to Amazon ECR.
docker push ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/smstudio-custom:${IMAGE_NAME}
Make the image you pushed to Amazon ECR available to Amazon SageMaker. Set the ARN of the IAM role for SageMaker with SageMakerFullAccess in ROLE_ARN
, then run the following command . You can also use the IAM role that you created when you created your Amazon SageMaker Studio domain. If you want to create a new IAM role, see here (https://docs.aws.amazon.com/ja_jp/IAM/latest/UserGuide/id_roles_create_for-service.html#roles-creatingrole-service-console). Select SageMaker as the use case selection part when creating an IAM role.
# Role in your account to be used for the SageMaker Image
ROLE_ARN=<role-arn>
aws --region ${REGION} sagemaker create-image \
--image-name ${IMAGE_NAME} \
--role-arn ${ROLE_ARN}
aws --region ${REGION} sagemaker create-image-version \
--image-name ${IMAGE_NAME} \
--base-image "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/smstudio-custom:${IMAGE_NAME}"
# Verify the image-version is created successfully. Do NOT proceed if image-version is in CREATE_FAILED state or in any other state apart from CREATED.
aws --region ${REGION} sagemaker describe-image-version --image-name ${IMAGE_NAME}
This config file defines the kernel name and mount directory for using the created image in a notebook. The KernelSpecs/Name
specified in app-image-config-input.json
is from the kernel displayed when the created image is started locally and then jupyter-kernelspec list
is executed. Select. Please refer to here for the detailed procedure.
aws --region ${REGION} sagemaker create-app-image-config --cli-input-json file://app-image-config-input.json
Specify the image registered with Amazon SageMaker in AppImageConfig and attach it to the Amazon SageMaker domain. Set the DomainID in update-domain-input.json to the domain ID (written as Studio ID in the AWS console) , and then execute the following command.
aws --region ${REGION} sagemaker update-domain --cli-input-json file://update-domain-input.json
You can now use the image you created in Amazon SageMaker Studio. If you click Amazon SageMaker Studio from the menu on the left side of the Amazon SageMaker console to display the SageMaker Studio control panel, you can see that the image has been added under "Custom image attached to domain" at the bottom. ..
After attaching a new image, stop Jupyter server once with File-> shutdown from the menu of Amazon SageMaker Studio, and then start Amazon SageMaker Studio again. If you forget this, you will not see the new kernel in the kernel list.
Launch Amazon SageMaker Studio from the SageMaker Studio Control Panel. If you want to create a new notebook, go to the bottom of Launcher Notebooks and compute resources Select the image you created from the Select a SageMaker Image pull-down menu (yellow arrow in the image) and click the Notebook button (red arrow in the image).
If you want to select an image in an already created notebook, click the kernel name area (yellow arrow in the image) at the top of the notebook and select the desired kernel from the pull-down menu.
In this article, I showed you how to use your own custom image as a notebook kernel in Amazon SageMaker Studio. This feature makes it easy to set up the environment you want to use.