[PYTHON] Start data science on the cloud

This article is the first day of the Cloud Analytics advent calendar.

We handle analysis, machine learning, AI, etc. with the theme of Analytics on the Cloud. This time, when starting the calendar, first prepare the analysis environment. The following is available for free for 30 days, so please touch it according to the calendar. Also, why not give it a try if you are currently launching a Data Scientist team?

Today, I will give an overview of the environment to be used and create the first notebook.

Data Science Experience DataScienceExperience is a data science platform on the Cloud provided by IBM. The tools required to perform Data Science, including the Jupyter Notebook A complete set is available and to promote data science in the enterprise A platform with team development functions.

Execution environment

In DataScienceExperience

Python 2.x/3.x
Scala 2.1x
R It is possible to use three languages. In addition, code execution is distributed on Spark in Python, Scala, and R. (Notebook can be used with sc generated) Behind the scenes, it's connected to Spark on Bluemix. It seems that the default is only available for 2 Executors, but Executors can be extended.

interface

Jupyter Notebook and R Studio are currently available.

Below is the Jupyter Notebook. スクリーンショット 2016-12-05 10.08.22.png

Below is RStudio. スクリーンショット 2016-12-05 10.10.00.png

The interface is the same as the notebook and RStudio that you usually use.

DataSrouce How you get your data is important when you start Data Science. DataScienceExperience comes with 5GB of Object Storage for free. In addition, it can be connected to each storage of Bluemix with GUI, especially Cloudant (CouchDB) and It has good connectivity with DashDB. Below is the connection creation screen. スクリーンショット 2016-12-05 10.20.22.png

Other connection information such as S3 and Impala is required, but it can be used as a Data Source. スクリーンショット 2016-12-05 10.20.38.png

Team development

On the DataScienceExperience, create a project and create a notebook. Easily share your Notebook by adding other users to your project You can go and share the DataSource.

The following is the edit screen of Collaborator. スクリーンショット 2016-12-05 10.23.49.png

You can set Admin, Viewer, Editor, etc.

Notebooks and Data Sources can also be shared for collaborative editing. スクリーンショット 2016-12-05 10.26.12.png

Creating a Project

First, create a project.

In the image below, some projects have already been created, Here, we will create a new project. Click the create project button on the upper right to jump to the project creation screen. スクリーンショット 2016-12-05 10.32.41.png

The image below is the project creation screen. スクリーンショット 2016-12-05 10.37.49.png

About the Spark Service and Object Storage fields Here, select Spark Service and Object Storage to which Project can connect, but you need to create Spark Service only for the first time. For Object Storage, you can select the one that comes with Spark Service when you create it, or the Object Storage d on Bluemix.

You have now created a brand new project! スクリーンショット 2016-12-05 10.42.06.png

Creating a notebook and executing simple code

Next, we will create a notebook and execute the code. From the add notebooks button on the project screen created earlier Moves to the Notebook creation screen.

スクリーンショット 2016-12-05 10.51.04.png

Spark version can be selected from 2.0 and 1.6. Here, Python 2 and Spark 1.6 are selected.

About the name of the notebook Currently, there seems to be a bug that Preview cannot be done well when the Name item is entered in Japanese. I've raised the issue, so I think it will be fixed, but let's enter it in English here.

You now have a brand new Notebook! スクリーンショット 2016-12-05 10.54.54.png

Let's try running the Python code!

hallo = "Hallo Data Scientist!"
print(hallo)

Paste the above code into the created Notebook cell and press the execute button. The code is executed and the result is output.

スクリーンショット 2016-12-05 10.57.39.png

You can execute cells by pressing Shift + Enter.

Now you are ready for Data Science! !! !! After that, we will look at analysis processing using Notebook, Object Storage, and other DataSources.