[PYTHON] Cloud Datalab Overview

Cloud Datalab hasn't been hit by the sun, although it looks good, but Google Cloud NEXT According to the timing of .com /), BETA changed to GA on March 8, 2017, and v1.0 was released. Was there.

It should be pretty amazing, but it hasn't received much attention yet, so I'll try to convey the appeal of Datalab little by little. (Maybe I will write the article in about 3 times.) First, I will give you an overview of Cloud Datalab.

What is Cloud Datalab?

--Interactive analysis environment for data analysis, visualization, and machine learning on GCP --Since it is developed based on Jupyter, which is popular with data analysts, it is a nice tool for users who are already using Jupyter to make a smooth transition. --The advantage of running on GCP is that it is integrated with BigQuery, GCS, and CloudML Engine, so you can seamlessly touch large data. --Datalab itself is published on github as a Docker image on the assumption that it will run on GCE.

price

--There is no particular cost for Datalab itself --However, since Datalab is supposed to run on GCE, you will incur costs for the GCE you use. --In addition, you will be charged only for the cost of GCP components. --BigQuery or GCS --In addition, by default, a disk is created for persistence and data is also retained in GCS for backup, so costs for that area will be incurred by default.

Start up and so on

It's almost like Cloud Datalab's Quick Start.

Installation

Assuming that Google Cloud SDK is installed, get additional datalab command

$ gcloud components install datalab

setup

If you set up a project or zone, you don't have to add command options, so it's easy. As mentioned above, since it runs on GCE, make settings related to GCE.

$ gcloud config set core/project ${PROJECT_ID}
$ gcloud config set compute/zone ${ZONE}

Launch an instance for Datalab and connect to Datalab

$ datalab create ${INSTANCE_NAME}

This will launch an instance for Datalab, create a nice network and configure it, launch a browser and connect to Datalab. It's easy.

datalab.png

It's a screen that people who have used Jupyter can understand what to do.

Sign in

Sign in with your account from your browser. That's because Datalab uses a service account to use other GCP services. It's easy because it's from the GUI.

Then analyze as you like

If you are familiar with Jupyter, you can analyze it as you like. Even if you are not familiar with it, Nice README is included, so if you follow it, how to use notebook, BigQuery And I think you can understand the cooperation with GCS. For the time being, it's easy because you can see the page here when you launch datalab.

when finished

The good thing about the cloud is that you can say goodbye if you use it as much as you want. Let's say goodbye.

$ datalab delete ${INSTANCE_NAME}

It's also easy to say goodbye. However, if you don't want to be charged anymore because this is just deleting the instance, you can delete the default disk (notebook content itself is mounted here) or Don't forget to delete the backup GCS.

Miscellaneous feelings

――It's nice to be able to build a Jupyter environment very easily. ――It seems better to be integrated with other cloud services than to run Jupyter alone. ――It seems that there are merits unique to the cloud, so it seems good to dig a little deeper.

Next time, we'll take a closer look at Datalab itself.

Recommended Posts

Cloud Datalab Overview
Linux overview
Ansible overview
The nice and regrettable parts of Cloud Datalab