[PYTHON] Machine learning with Jupyter Notebook in OCI Always Free environment (2019/12/17)

This article is from December 17th Article of "Oracle Cloud Infrastructure Advent Calendar 2019" It is written as 2019-12-17).

What you are doing in this article

--Create VCN / Compute instance / Autonomous Database (ADW) in Always Free environment of Oracle Cloud Infrastructure --Set up a Jupyter Notebook environment on your Compute instance --Machine learning using data from Autonomous Database (decision tree / random forest)

Environment construction procedure

  1. Create a virtual cloud network

  2. Create a Compute instance

  3. Create Autonmous Database (ADW this time)![Image01.jpg](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108635/5a0120bc-05c4-6f3a- e99a-3e873a044f1e.jpeg)

  4. Set up a Jupyter Notebook environment on your Compute instance

  5. Connect to Autonomous Database from Jupyter Notebook and perform machine learning

1. Create a virtual cloud network

Create a Virtual Cloud Network (VCN) in the OCI Web Console Virtual networking quickstart has been added since December 5, 2019 You can easily create a VCN with four input items: "VCN name", "VCN CIDR", "public subnet CIDR", and "private subnet CIDR". image01.jpg (When creating a VCN using Quickstart, if you create it in Japanese display, the subnet name will be something like "Public subnet- " It is recommended to change the display language to English)

2. Create a Compute instance

Create a Compute instance with the image of Oracle Linux 7 --Click "View Shape, Network and Storage Options" and in the networking configuration, 1. Select the virtual cloud network and public subnet created in image02.JPG --In addition, check "Assign public IP address" and create image03.JPG

3. Create an Autonmous Database (ADW this time)

Create a workload type from Create Autonomous Database by selecting Data Warehouse image04.JPG

Create a working user using SQL Developer Web etc.

CREATE USER scott IDENTIFIED BY password; 

GRANT DWROLE TO scott;
GRANT UNLIMITED TABLESPACE TO scott;

GRANT
    SELECT ANY TABLE,
    UPDATE ANY TABLE,
    DELETE ANY TABLE,
    INSERT ANY TABLE
TO scott;

4. Set up a Jupyter Notebook environment on your Compute instance

Connect to the Compute instance (Oracle Linux) created in 2. with ssh (opc user) and install Jupyter Notebook

4-1. Install Python 3 4-1-1. EPEL repository-Installation of Python3.6

sudo yum install -y oracle-epel-release-el7 oracle-release-el7
sudo yum install -y python36
sudo yum install -y libSM.x86_64
sudo yum install -y libXext.x86_64
sudo yum install -y libXrender.x86_64

4-1-2. mlevn (Python virtual environment setup and activation)

python3.6 -m venv mlenv 
source mlenv/bin/activate

​ 4-1-3. Install the library used for machine learning (library for executing this procedure)

pip3 install --upgrade pip
pip3 install pandas
pip3 install seaborn
pip3 install sklearn

4-1-4. Install Jupyter

python3 -m pip install jupyter

4-1-5. Oracle Instant Client Installation and Configuration

sudo yum -y install oraclelinux-developer-release-el7
sudo yum -y install python-cx_Oracle
pip3 install cx_Oracle
sudo sh -c "echo /usr/lib/oracle/18.3/client64/lib > /etc/ld.so.conf.d/oracle-instantclient.conf"
sudo ldconfig

4-1-6. Download Autonomous Database Client Credentials (Wallet) From the details screen of the Autonomous Database created in 3., click "DB Connection", and from the "Database Connection" dialog, click "Download Wallet" to download the Zip file. image05.JPG Transfer the downloaded Zip file to a Compute instance (such as scp) and unzip it (note the unzipped directory Example: / home / opc / wallet) Edit the sqlnet.ora file in the unzipped directory Change "~ / network / admin" specified in DIRECTORY on the "WALLET_LOCATION" line to the unzipped directory

sqlnet.ora


WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="/home/opc/wallet")))
SSL_SERVER_DN_MATCH=yes

4-1-7. Run the notebook Set environment variables (LD_LIBRARY_PATH is the path of the Instance Client, TNS_ADMIN is the directory where the wallet was unzipped)

export LD_LIBRARY_PATH=/usr/lib/oracle/18.3/client64/lib:$LD_LIBRARY_PATH
export TNS_ADMIN=/home/opc/wallet
source mlenv/bin/activate
jupyter notebook --ip=0.0.0.0 &

Make a note of the token key that appears when you run the jupyter notebook

    To access the notebook, open this file in a browser:
        file:///home/opc/.local/share/jupyter/runtime/nbserver-2210-open.html
    Or copy and paste one of these URLs:
        http://compute01:8888/?token=0e701120a4e7319ae8b970ac069fbfee53e7b07f072cc2dc
     or http://127.0.0.1:8888/?token=0e701120a4e7319ae8b970ac069fbfee53e7b07f072cc2dc     ```

5.Connect to Autonomous Database from Jupyter Notebook and perform machine learning

5-1.Establish tunneling in another ssh session from client PC

$ ssh –i <private_key> opc@<public_IP_address> -L 8000:localhost:8888​


5-2.Open a web browser on your local machine and access the URL below
http://localhost:8000 
5-3.When prompted for a token, enter the token key that appears when you run jupyter Notebook
![image06.JPG](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108635/bbf7a0a4-64f7-a9a5-0a5f-1ed216b319c3.jpeg)

![image07.JPG](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108635/b1949454-7e7c-4cd5-4d61-fc0e5c6bfb37.jpeg)

5-4.Connect to Autonomous Database from Jupyter Notebook
Autonomous Database username, password and tnsnames in the unzipped wallet.Check the connection service name from ora and specify

Example of connecting to Autonomous Database and checking DB version

import cx_Oracle con = cx_Oracle.connect('scott/@orcl_medium') print(con.version) con.close()


Example of connecting to Autonomous Database and checking the number of SALES tables in sh schema

import cx_Oracle con = cx_Oracle.connect('scott/@orcl_medium') cur = con.cursor() # opens cursor for statements to use cur.execute('select count(*) from sh.sales') for result in cur: # prints all data print(result) con.close()


CUST by gender from the CUSTOMERS table_Compare VALID

import pandas.io.sql as psql import cx_Oracle import seaborn as sns %matplotlib inline

con = cx_Oracle.connect('scott/@orcl_medium')

sql = 'select CUST_GENDER,CUST_VALID from sh.customers' df = psql.read_sql(sql,con) sns.countplot('CUST_GENDER',hue='CUST_VALID',data=df)

Column names are written in uppercase

con.close()


![image08.JPG](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/108635/b0fd0ce8-514a-d38e-c19e-ca4228e4cd5a.jpeg)



####Machine learning
scikit-Library tree and scikit in learn-Using the random forest class fier in learn ensemble
CUST in the CUSTOMERS table_Create a machine learning model that predicts VALID

-Get features and objective variables with SELECT statement and store in DataFrame

sql = 'select cust_id,cust_year_of_birth,CUST_GENDER,cust_postal_code,cust_valid from sh.customers' df = psql.read_sql(sql,con)


-Missing value handling and categorical variables(letter)Conversion process

Missing value processing

df['CUST_YEAR_OF_BIRTH'] = df['CUST_YEAR_OF_BIRTH'].fillna(df['CUST_YEAR_OF_BIRTH'].median()) df['CUST_POSTAL_CODE'] = df['CUST_POSTAL_CODE'].fillna(df['CUST_POSTAL_CODE'].median())

Conversion of categorical variables

df['CUST_VALID']=df['CUST_VALID'].apply(lambda x:1 if x == 'I' else 0) df['CUST_GENDER']=df['CUST_GENDER'].apply(lambda x:1 if x == 'M' else 0)


-Divided into training data and test data(This time, 90% is used as learning data)

X = df.loc[:, ["CUST_YEAR_OF_BIRTH","CUST_GENDER","CUST_POSTAL_CODE"]] Y = df.loc[:, "CUST_VALID"] (train_X, test_X ,train_y, test_y) = train_test_split(X, Y, test_size = 0.1, random_state = 666)


-Decision tree model creation, verification with test data, confirmation of accuracy

clf = DecisionTreeClassifier(random_state=0) clf = clf.fit (train_X, train_y) #Create a model! pred = clf.predict(test_X) print("Prediction Score: {}".format(clf.score(test_X, test_y)))


-Random forest model creation and validation with test data

from sklearn.ensemble import RandomForestClassifier clf = RandomForestClassifier(random_state=0) clf = clf.fit(train_X, train_y) pred = clf.predict(test_X)


-Whole machine learning code


#### **`ml4py.py`**
```py

import pandas.io.sql as psql
import cx_Oracle
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import (roc_curve, auc, accuracy_score)
from sklearn.metrics import classification_report
con = cx_Oracle.connect('scott/<password>@orcl_medium')

sql = 'select cust_id,cust_year_of_birth,CUST_GENDER,cust_postal_code,cust_valid from sh.customers'
df = psql.read_sql(sql,con)
# Missing value processing
df['CUST_YEAR_OF_BIRTH'] = df['CUST_YEAR_OF_BIRTH'].fillna(df['CUST_YEAR_OF_BIRTH'].median())
df['CUST_POSTAL_CODE'] = df['CUST_POSTAL_CODE'].fillna(df['CUST_POSTAL_CODE'].median())

# Conversion of categorical variables
df['CUST_VALID']=df['CUST_VALID'].apply(lambda x:1 if x == 'I' else 0)
df['CUST_GENDER']=df['CUST_GENDER'].apply(lambda x:1 if x == 'M' else 0)

X = df.loc[:, ["CUST_YEAR_OF_BIRTH","CUST_GENDER","CUST_POSTAL_CODE"]]
Y = df.loc[:, "CUST_VALID"]
(train_X, test_X ,train_y, test_y) = train_test_split(X, Y, test_size = 0.1, random_state = 666)

clf = DecisionTreeClassifier(random_state=0)
 clf = clf.fit (train_X, train_y) #Create a model!
pred = clf.predict(test_X)
fpr, tpr, thresholds = roc_curve(test_y, pred, pos_label=1)
auc(fpr, tpr)
accuracy_score(pred, test_y)
classification_report(pred, test_y, labels=None)
print("Prediction Score: {}".format(clf.score(test_X, test_y)))
 print ("Decision tree model accuracy: {: .6f}". format (accuracy_score (pred, test_y)))

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state=0)
clf = clf.fit(train_X, train_y)
pred = clf.predict(test_X)
fpr, tpr, thresholds = roc_curve(test_y, pred, pos_label=1)
auc(fpr, tpr)
 print ("Random forest model accuracy: {: .6f}". format (accuracy_score (pred, test_y)))
con.close() 

##Summary and for the future Always Free environment(Compute instance and Autonomous DB)Machine learning with Jupyter Notebook using(Decision tree / random forest)Was able to be carried out.

Dated December 5, 2019The ability to analyze machine learning, geographic data, and graph data in Oracle Database is no longer a paid option and is now available in Standard Edition. Oracle Machine Learning API for Python iscoming soonBecause it means We would like to be able to compare when Oralce Machine Learning for Python is released.

###Reference information

Recommended Posts

Machine learning with Jupyter Notebook in OCI Always Free environment (2019/12/17)
Learn machine learning anytime, anywhere in an on-demand Jupyter Notebook environment
Reflect the virtual environment created with Miniconda in Jupyter notebook
How to quickly create a machine learning environment using Jupyter Notebook with UbuntuServer 16.04 LTS
How to build Anaconda virtual environment used in Azure Machine Learning and link with Jupyter
How to quickly create a machine learning environment using Jupyter Notebook with UbuntuServer 16.04 LTS with anaconda
Create an arbitrary machine learning environment with GCP + Docker + Jupyter Lab
Build AI / machine learning environment with Python
How to quickly create a machine learning environment using Jupyter Notebook on macOS Sierra with anaconda
I wanted to use jupyter notebook with docker in pip environment (opticspy)
Analytical environment construction with Docker (jupyter notebook + PostgreSQL)
Browser specification of Jupyter Notebook in Windows environment
Try using conda virtual environment with Jupyter Notebook
Error running Jupyter Notebook in Anaconda virtual environment
Try using Jupyter Notebook of Azure Machine Learning
Build a Python machine learning environment with a container
Virtual environment construction with Docker + Flask (Python) + Jupyter notebook
Build an interactive environment for machine learning in Python
Drawing a tree structure with D3.js in Jupyter Notebook
Build a machine learning application development environment with Python
Create a machine learning environment from scratch with Winsows 10
[Machine learning] Start Spark with iPython Notebook and try MLlib
Rebuilding an environment for machine learning with Miniconda (Windows version)
Machine learning learned with Pokemon
Using Graphviz with Jupyter Notebook
Machine learning in Delemas (practice)
Display HTML in Jupyter notebook
Build a machine learning scikit-learn environment with VirtualBox and Ubuntu
Use pip with Jupyter Notebook
Multiprocessing error in Jupyter Notebook
Machine learning with Python! Preparation
Install Jupiter Notebook on Windows with pip in proxy environment
Use Cython with Jupyter Notebook
Play with Jupyter Notebook (IPython Notebook)
Machine learning Minesweeper with PyTorch
I want to use a virtual environment with jupyter notebook!
Machine learning environment construction macbook 2021
Build a machine learning environment
Used in machine learning EDA
Write charts in real time with Matplotlib on Jupyter notebook
Machine learning environment settings based on Python 3 on Mac (coexistence with Python 2)
Beginning with Python machine learning
Switch virtual environment with jupyter
Try machine learning with Kaggle
python3.8 venv environment jupyter notebook
Git management of Jupyter notebook (ipynb) differences in easy-to-read with JupyterLab
How about Anaconda for building a machine learning environment in Python?
Building a Windows 7 environment for getting started with machine learning with Python
Perform morphological analysis in the machine learning environment launched by GCE
How to use jupyter notebook without polluting your environment with Docker
Build a comfortable psychological experiment / analysis environment with PsychoPy + Jupyter Notebook
Jupyter Notebook 6.0.2 cannot be installed in the Python 2.7 environment created in Anaconda
Build jupyter notebook environment with Visual Studio Code (VS Code) Mac version
Visualize decision trees with jupyter notebook
Make a sound with Jupyter notebook
I tried machine learning with liblinear
Machine learning with python (1) Overall classification
Automate routine tasks in machine learning
Easy Jupyter environment construction with Cloud9
Generate Jupyter notebook ".ipynb" in Python
Classification and regression in machine learning