[PYTHON] Start data science on the cloud

This article is the first day of the Cloud Analytics advent calendar.

We handle analysis, machine learning, AI, etc. with the theme of Analytics on the Cloud. This time, when starting the calendar, first prepare the analysis environment. The following is available for free for 30 days, so please touch it according to the calendar. Also, why not give it a try if you are currently launching a Data Scientist team?

Today, I will give an overview of the environment to be used and create the first notebook.

Data Science Experience DataScienceExperience is a data science platform on the Cloud provided by IBM. The tools required to perform Data Science, including the Jupyter Notebook A complete set is available and to promote data science in the enterprise A platform with team development functions.

Execution environment

In DataScienceExperience

interface

Jupyter Notebook and R Studio are currently available.

Below is the Jupyter Notebook. スクリーンショット 2016-12-05 10.08.22.png

Below is RStudio. スクリーンショット 2016-12-05 10.10.00.png

The interface is the same as the notebook and RStudio that you usually use.

DataSrouce How you get your data is important when you start Data Science. DataScienceExperience comes with 5GB of Object Storage for free. In addition, it can be connected to each storage of Bluemix with GUI, especially Cloudant (CouchDB) and It has good connectivity with DashDB. Below is the connection creation screen. スクリーンショット 2016-12-05 10.20.22.png

Other connection information such as S3 and Impala is required, but it can be used as a Data Source. スクリーンショット 2016-12-05 10.20.38.png

Team development

On the DataScienceExperience, create a project and create a notebook. Easily share your Notebook by adding other users to your project You can go and share the DataSource.

The following is the edit screen of Collaborator. スクリーンショット 2016-12-05 10.23.49.png

You can set Admin, Viewer, Editor, etc.

Notebooks and Data Sources can also be shared for collaborative editing. スクリーンショット 2016-12-05 10.26.12.png

Creating a Project

First, create a project.

In the image below, some projects have already been created, Here, we will create a new project. Click the create project button on the upper right to jump to the project creation screen. スクリーンショット 2016-12-05 10.32.41.png

The image below is the project creation screen. スクリーンショット 2016-12-05 10.37.49.png

About the Spark Service and Object Storage fields Here, select Spark Service and Object Storage to which Project can connect, but you need to create Spark Service only for the first time. For Object Storage, you can select the one that comes with Spark Service when you create it, or the Object Storage d on Bluemix.

You have now created a brand new project! スクリーンショット 2016-12-05 10.42.06.png

Creating a notebook and executing simple code

Next, we will create a notebook and execute the code. From the add notebooks button on the project screen created earlier Moves to the Notebook creation screen.

スクリーンショット 2016-12-05 10.51.04.png

Spark version can be selected from 2.0 and 1.6. Here, Python 2 and Spark 1.6 are selected.

About the name of the notebook Currently, there seems to be a bug that Preview cannot be done well when the Name item is entered in Japanese. I've raised the issue, so I think it will be fixed, but let's enter it in English here.

You now have a brand new Notebook! スクリーンショット 2016-12-05 10.54.54.png

Let's try running the Python code!

hallo = "Hallo Data Scientist!"
print(hallo)

Paste the above code into the created Notebook cell and press the execute button. The code is executed and the result is output.

スクリーンショット 2016-12-05 10.57.39.png

You can execute cells by pressing Shift + Enter.

Now you are ready for Data Science! !! !! After that, we will look at analysis processing using Notebook, Object Storage, and other DataSources.

Recommended Posts

Start data science on the cloud
Try "100 knocks on data science" ①
How the Information Systems Department (beginners) can start data science
Completely erase the data on the hard disk
Books on data science to read in 2020
Learn data science
[Python] 100 knocks on data science (structured data processing) 018 Explanation
Looking back on the data M-1 Grand Prix 2020
[Python] 100 knocks on data science (structured data processing) 023 Explanation
[Python] 100 knocks on data science (structured data processing) 030 Explanation
[Python] 100 knocks on data science (structured data processing) 022 Explanation
[Python] 100 knocks on data science (structured data processing) 017 Explanation
[Python] 100 knocks on data science (structured data processing) 026 Explanation
[Python] 100 knocks on data science (structured data processing) 016 Explanation
[Python] 100 knocks on data science (structured data processing) 024 Explanation
[Python] 100 knocks on data science (structured data processing) 027 Explanation
[Python] 100 knocks on data science (structured data processing) 029 Explanation
[Python] 100 knocks on data science (structured data processing) 015 Explanation
[Python] 100 knocks on data science (structured data processing) 028 Explanation
Data Science Virtual Machines is the best environment for data analysis from now on!
Run the flask app on Cloud9 and Apache Httpd
Clean up the Cloud pak for Data deployment space
Send log data from the server to Splunk Cloud
Challenge 100 data science knocks
Install django on python + anaconda and start the server
OPT data science competition
Try translating the Python Data Science Handbook into Japanese
Display the weather forecast on M5Stack + Google Cloud Platform
Try accessing AWS Redshift data using Oracle Cloud Infrastructure Data Science
Analyzing data on the number of corona patients in Japan
Dockerfile for creating a data science environment based on pip3
Data analysis based on the election results of the Tokyo Governor's election (2020)
Until the start of the django tutorial with pycharm on Windows
Deploy the strongest front-end Streamlit for data scientists on Azure!
[Data science memorandum] Confirmation of the contents of DataFrame type [python]
Data science 100 knock commentary (P021 ~ 040)
Data science 100 knock commentary (P061 ~ 080)
Data science 100 knock commentary (P041 ~ 060)
Gzip the data by streaming
Data science 100 knock commentary (P081 ~ 100)
Install the JDK on Linux
Data Science Cheat Sheet (Python)
[Python] Notes on data analysis
Infra_ Data Science Course Output
Watch the video on Fedora31
Paste the link on linux
How to start the program
[Old article] Data Science Experience (DSX) is now available on the Lite plan (much free) on IBM Cloud, so I touched it ★ 2017/11 Update
Organize useful blogs in the field of data science (overseas & Japan)
I studied with Kaggle Start Book on the subject of kaggle [Part 1]
I checked the image of Science University on Twitter with Word2Vec.
How to update the python version of Cloud Shell on GCP
[AWS / Tello] Build a system to operate the drone on the cloud
[Django] Display registration data associated with users on the registration / edit form (Form)