Cloud Pak for Data object operation example in Python (WML client, project_lib)

Introduction

When creating a model with Notebook (Jupyter Notebook) in the analysis project of Cloud Pak for Data (hereinafter CP4D), watson-machine-learning-client as a library for importing data, storing the model, deploying the created model, etc. -There are V4 (hereinafter WML client) [^ 1] and project_lib [^ 2]. Both are included by default in CP4D's Notebook standard Python environment. In this article, I'll show you how to use these libraries in detail.

[^ 1]: For details, see WML client Reference Guide watson-machine-learning-client (V4) and CP4D v2. 5 See the product documentation Deploy using the Python client (https://www.ibm.com/support/knowledgecenter/en/SSQNUZ_2.5.0/wsj/wmls/wmls-deploy-python.html). Please note that the WML client reference guide may be updated from time to time.

[^ 2]: For more information, see CP4D v2.5 Product Manual [Using project-lib for Python](https://www.ibm.com/support/knowledgecenter/en/SSQNUZ_2.5.0/wsj/analyze-data] /project-lib-python.html)

Since the WML client also authenticates by specifying the URL, it works even in a Python environment outside CP4D. It can also be used as an object manipulation method for models and deployments in CP4D from external batch programs.

(Operation confirmed version)

How to check with pip command in Notebook

python


!pip show watson-machine-learning-client-V4

output


Name: watson-machine-learning-client-V4
Version: 1.0.64
Summary: Watson Machine Learning API Client
Home-page: http://wml-api-pyclient-v4.mybluemix.net
Author: IBM
Author-email: [email protected], [email protected], [email protected]
License: BSD
Location: /opt/conda/envs/Python-3.6-WMLCE/lib/python3.6/site-packages
Requires: urllib3, pandas, tabulate, requests, lomond, tqdm, ibm-cos-sdk, certifi
Required-by: 
!pip show project_lib

output


Name: project-lib
Version: 1.7.1
Summary: programmatic interface for accessing project assets in IBM Watson Studio
Home-page: https://github.ibm.com/ax/project-lib-python
Author: IBM Watson Studio - Notebooks Team
Author-email: None
License: UNKNOWN
Location: /opt/conda/envs/Python-3.6-WMLCE/lib/python3.6/site-packages
Requires: requests
Required-by: 

List of main operations on CP4D and libraries used

You can generate and save Data Assets, Models, Functions, Deployments, and more. You can also run the deployment you created.

Operations on analytical projects

The data asset mainly uses project_lib, and the model system uses WML client.

Main operations Library to use
Reading data from data assets[^3] project_lib or
pandas.read_with csv'/project_data/data_asset/file name'Read directly
Output file data to data assets[^4] project_lib
List of data assets WML client
Save model WML client
List of models WML client
Save function WML client
List of functions WML client

[^ 3]: To load the data, click the data button (written as 0100) at the top right of the Notebook screen, and click the corresponding data asset name> Insert into code> pandas DataFrame in the Notebook. The code will be automatically inserted in the cell. By default, it seems that the code of pandas.read_csv is inserted in the case of a file, and the code of project_lib is inserted in the case of a DB table.

[^ 4]: It is also possible with WML client, but since the file is saved in an area where the stored file is different from the original data asset, and it has been confirmed that the file name is invalid when downloaded, WML client We do not recommend storing in data assets at. I won't write how to do that in this article either. </ span>

Operations on deployment space (analytical deployment)

All use WML client.

Main operations Library to use
Output file data to data assets WML client
List of data assets WML client
Save model WML client
List of models WML client
function(function)Save[^5] WML client
function(function)List display[^5] WML client
Creating a deployment WML client
List deployments WML client
Perform deployment WML client

[^ 5]: Functions are described as "features" on the screen of the deployment space. I feel that the Japanese translation is not unified and it is not good.

WML client import and initialization

Import WML client

from watson_machine_learning_client import WatsonMachineLearningAPIClient

WML client initialization (authentication)

Initialize the WML client with the connection destination and authentication information. There are two ways to get authentication information.

  1. Use the value of the OS environment variable USER_ACCESS_TOKEN
  2. Use CP4D username and password

1 is a method that can be used with Notebook on CP4D. If you use the WML client in an environment outside CP4D, it is 2. As a note,

In case of method 1



import os
token = os.environ['USER_ACCESS_TOKEN']
url = "https://cp4d.host.name.com"
wml_credentials = {
    "token" : token,
    "instance_id" : "openshift",
    "url": url,
    "version": "3.0.0"
}
client = WatsonMachineLearningAPIClient(wml_credentials)

In case of method 2


#For username and password, specify the one of the CP4D user who is actually used for authentication.
url = "https://cp4d.host.name.com"
wml_credentials = {
   "username":"xxxxxxxx",
   "password": "xxxxxxxx",
   "instance_id": "openshift",
   "url" : url,
   "version": "3.0.0"
}
client = WatsonMachineLearningAPIClient(wml_credentials)

Switching between analysis project and deployment space

Set whether the operation target of the subsequent processing is the analysis project (default_project) or the deployment space (default_space). The initial state is set in the analysis project. *** When changing the operation target, be sure to perform this switching operation (addiction point). *** ***

How to find each ID

For the ID of the analysis project, use the one contained in the OS environment variable PROJECT_ID.

Set the ID of the analysis project


project_id = os.environ['PROJECT_ID']

For the ID of the deployment space, check it in advance with "Space GUID" in "Settings" of the deployment space on the CP4D screen, or use the GUID displayed by client.repository.list_spaces () by the following method.

Find out the ID of the deployment space


client.repository.list_spaces()

output


------------------------------------  --------------------  ------------------------
GUID                                  NAME                  CREATED
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx  DepSpaceName          2020-05-25T09:13:04.919Z
------------------------------------  --------------------  ------------------------

Set the ID of the deployment space


space_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

Switching the operation target

Switch the operation target to the analysis project


client.set.default_project(project_id)

Switch the operation target to the deployment space


client.set.default_space(space_id)

Manipulating data assets

List of data assets (analysis project)

Use the WML client.

#Switch to an analysis project (only if you need to switch)
client.set.default_project(project_id)

#View a list of data assets
client.data_assets.list()

List data assets (deployment space)

Use the WML client.

#Switch to deployment space (only if you need to switch)
client.set.default_space(space_id)

#View a list of data assets
client.data_assets.list()

Reading data from data assets (analysis project)

Click the data button (written as 0100) in the upper right corner of the Notebook screen and click the corresponding data asset name> Insert into code> pandas DataFrame to automatically insert the code into the cell in the notebook. It's easy to use this.

For files such as CSV, pandas.read_csv will automatically insert the code to read the data. The X part of df_data_X will automatically increase as you repeat the insert operation.

Insert code(For files)


import pandas as pd
df_data_1 = pd.read_csv('/project_data/data_asset/filename.csv')
df_data_1.head()

There is an example code in the product manual to read the file data using project_lib, but this is the code.

project_Reading files using lib


from project_lib import Project
project = Project.access()

my_file = project.get_file("filename.csv")

my_file.seek(0)
import pandas as pd
df = pd.read_csv(my_file)

In the case of DB table, the code using project_lib is automatically inserted in "Insert into code" above. It has `` `# @ hidden_cell``` in the head, so you can choose not to include this cell when sharing your notebook. [^ 6]

[^ 6]: CP4D v2.5 Product Manual [Hide Sensitive Code Cells in Notebook](https://www.ibm.com/support/knowledgecenter/en/SSQNUZ_2.5.0/wsj/analyze-data /hide_code.html)

Insert code(Db2 table SCHEMANAME.Example of TBL1)


# @hidden_cell
# This connection object is used to access your data and contains your credentials.
# You might want to remove those credentials before you share your notebook.

from project_lib import Project
project = Project.access()
TBL1_credentials = project.get_connected_data(name="TBL1")

import jaydebeapi, pandas as pd
TBL1_connection = jaydebeapi.connect('com.ibm.db2.jcc.DB2Driver',
    '{}://{}:{}/{}:user={};password={};'.format('jdbc:db2',
    TBL1_credentials['host'],
    TBL1_credentials.get('port', '50000'),
    TBL1_credentials['database'],
    TBL1_credentials['username'],
    TBL1_credentials['password']))

query = 'SELECT * FROM SCHEMANAME.TBL1'
data_df_1 = pd.read_sql(query, con=TBL1_connection)
data_df_1.head()

# You can close the database connection with the following code.
# TBL1_connection.close()
# To learn more about the jaydebeapi package, please read the documentation: https://pypi.org/project/JayDeBeApi/

Saving data in data assets (analysis project)

How to save a pandas dataframe as a CSV file. Use project_lib.

from project_lib import Project
project = Project.access()

project.save_data("filename.csv", df_data_1.to_csv(),overwrite=True)

Storage of data in data assets (deployment space)

Similarly, how to save the CSV file to the deployment space. The data assets in the deployment space are used as input data during batch execution of the deployment. Use WML client.

#Output the pandas data frame as a CSV file once. By default/home/wsuser/Stored under work
df_data_1.to_csv("filename.csv")

#Switch to deployment space (only if you need to switch)
client.set.default_space(space_id)

#Save as a data asset
asset_details = client.data_assets.create(name="filename.csv",file_path="/home/wsuser/work/filename.csv")

The ID and href of the saved data asset are included in the return value asset_details of create. The ID and href are used when batching the deployment in the deployment space.

#return value of create(Meta information)Confirmation of
asset_details

output


{'metadata': {'space_id': 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx',
  'guid': 'yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy',
  'href': '/v2/assets/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz?space_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx',
  'asset_type': 'data_asset',
  'created_at': '2020-05-25T09:23:06Z',
  'last_updated_at': '2020-05-25T09:23:06Z'},
 'entity': {'data_asset': {'mime_type': 'text/csv'}}}

Take it out as follows.

Return value asset_Getting meta information from details


asset_id = client.data_assets.get_uid(asset_details)
asset_href = client.data_assets.get_href(asset_details)

Return value asset_Getting meta information from details (another way)


asset_id = asset_details['metadata']['guid']
asset_href = asset_details['metadata']['href']

Model manipulation

Create a model as a preparation

As an example, we will create a sckikt-learn random forest model using the Iris sample data.

#Load Iris sample data
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['iris_type'] = iris.target_names[iris.target]

#Create a model in a random forest
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

X = df.drop('iris_type', axis=1)
y = df['iris_type']
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=0)

clf = RandomForestClassifier(max_depth=2, random_state=0, n_estimators=10)
model = clf.fit(X_train, y_train)
    
#Check the accuracy of the model
from sklearn.metrics import confusion_matrix, accuracy_score
y_test_predicted = model.predict(X_test)    
print("confusion_matrix:")
print(confusion_matrix(y_test,y_test_predicted))
print("accuracy:", accuracy_score(y_test,y_test_predicted))

The above `` `model``` is a trained model.

Saving the model (analysis project)

Saving the model to an analysis project is possible, though not a required operation for deployment. Use the WML client.

#Switch to an analysis project (only if you need to switch)
client.set.default_project(project_id)

#Describe model meta information
model_name = "sample_iris_model"
meta_props={
    client.repository.ModelMetaNames.NAME: model_name,
    client.repository.ModelMetaNames.RUNTIME_UID: "scikit-learn_0.22-py3.6",
    client.repository.ModelMetaNames.TYPE: "scikit-learn_0.22",
    client.repository.ModelMetaNames.INPUT_DATA_SCHEMA:{
        "id":"iris model",
        "fields":[
            {'name': 'sepal length (cm)', 'type': 'double'},
            {'name': 'sepal width (cm)', 'type': 'double'},
            {'name': 'petal length (cm)', 'type': 'double'},
            {'name': 'petal width (cm)', 'type': 'double'}
        ]
    },
    client.repository.ModelMetaNames.OUTPUT_DATA_SCHEMA: {
        "id":"iris model",
        "fields": [
            {'name': 'iris_type', 'type': 'string','metadata': {'modeling_role': 'prediction'}}
        ]
    }
}

#Save the model. The return value contains the metadata of the created model
model_artifact = client.repository.store_model(model, meta_props=meta_props, training_data=X, training_target=y)
Supplement: Meta information to include in the model

It is not mandatory to specify INPUT_DATA_SCHEMA and OUTPUT_DATA_SCHEMA in the meta information meta_props to be included in the model, but it is required if you want to specify the test execution in the form format on the deployment details screen after *** deployment. The format specified here will be the form input format (addiction point) ***.

How to check the RUNTIME_UID that can be specified
The part displayed in the GUID can be specified as the RUNTIME_UID in meta_props.

python


# https://wml-api-pyclient-dev-v4.mybluemix.net/#runtimes
client.runtimes.list(limit=200)

output(CP4Dv2.In case of 5)


    --------------------------  --------------------------  ------------------------  --------
    GUID                        NAME                        CREATED                   PLATFORM
    do_12.10                    do_12.10                    2020-05-03T08:35:16.679Z  do
    do_12.9                     do_12.9                     2020-05-03T08:35:16.648Z  do
    pmml_4.3                    pmml_4.3                    2020-05-03T08:35:16.618Z  pmml
    pmml_4.2.1                  pmml_4.2.1                  2020-05-03T08:35:16.590Z  pmml
    pmml_4.2                    pmml_4.2                    2020-05-03T08:35:16.565Z  pmml
    pmml_4.1                    pmml_4.1                    2020-05-03T08:35:16.537Z  pmml
    pmml_4.0                    pmml_4.0                    2020-05-03T08:35:16.510Z  pmml
    pmml_3.2                    pmml_3.2                    2020-05-03T08:35:16.478Z  pmml
    pmml_3.1                    pmml_3.1                    2020-05-03T08:35:16.450Z  pmml
    pmml_3.0                    pmml_3.0                    2020-05-03T08:35:16.422Z  pmml
    ai-function_0.1-py3.6       ai-function_0.1-py3.6       2020-05-03T08:35:16.378Z  python
    ai-function_0.1-py3         ai-function_0.1-py3         2020-05-03T08:35:16.350Z  python
    hybrid_0.2                  hybrid_0.2                  2020-05-03T08:35:16.322Z  hybrid
    hybrid_0.1                  hybrid_0.1                  2020-05-03T08:35:16.291Z  hybrid
    xgboost_0.90-py3.6          xgboost_0.90-py3.6          2020-05-03T08:35:16.261Z  python
    xgboost_0.82-py3.6          xgboost_0.82-py3.6          2020-05-03T08:35:16.235Z  python
    xgboost_0.82-py3            xgboost_0.82-py3            2020-05-03T08:35:16.204Z  python
    xgboost_0.80-py3.6          xgboost_0.80-py3.6          2020-05-03T08:35:16.173Z  python
    xgboost_0.80-py3            xgboost_0.80-py3            2020-05-03T08:35:16.140Z  python
    xgboost_0.6-py3             xgboost_0.6-py3             2020-05-03T08:35:16.111Z  python
    spss-modeler_18.2           spss-modeler_18.2           2020-05-03T08:35:16.083Z  spss
    spss-modeler_18.1           spss-modeler_18.1           2020-05-03T08:35:16.057Z  spss
    spss-modeler_17.1           spss-modeler_17.1           2020-05-03T08:35:16.029Z  spss
    scikit-learn_0.22-py3.6     scikit-learn_0.22-py3.6     2020-05-03T08:35:16.002Z  python
    scikit-learn_0.20-py3.6     scikit-learn_0.20-py3.6     2020-05-03T08:35:15.965Z  python
    scikit-learn_0.20-py3       scikit-learn_0.20-py3       2020-05-03T08:35:15.939Z  python
    scikit-learn_0.19-py3.6     scikit-learn_0.19-py3.6     2020-05-03T08:35:15.912Z  python
    scikit-learn_0.19-py3       scikit-learn_0.19-py3       2020-05-03T08:35:15.876Z  python
    scikit-learn_0.17-py3       scikit-learn_0.17-py3       2020-05-03T08:35:15.846Z  python
    spark-mllib_2.4             spark-mllib_2.4             2020-05-03T08:35:15.816Z  spark
    spark-mllib_2.3             spark-mllib_2.3             2020-05-03T08:35:15.788Z  spark
    spark-mllib_2.2             spark-mllib_2.2             2020-05-03T08:35:15.759Z  spark
    tensorflow_1.15-py3.6       tensorflow_1.15-py3.6       2020-05-03T08:35:15.731Z  python
    tensorflow_1.14-py3.6       tensorflow_1.14-py3.6       2020-05-03T08:35:15.705Z  python
    tensorflow_1.13-py3.6       tensorflow_1.13-py3.6       2020-05-03T08:35:15.678Z  python
    tensorflow_1.11-py3.6       tensorflow_1.11-py3.6       2020-05-03T08:35:15.646Z  python
    tensorflow_1.13-py3         tensorflow_1.13-py3         2020-05-03T08:35:15.619Z  python
    tensorflow_1.13-py2         tensorflow_1.13-py2         2020-05-03T08:35:15.591Z  python
    tensorflow_0.11-horovod     tensorflow_0.11-horovod     2020-05-03T08:35:15.562Z  native
    tensorflow_1.11-py3         tensorflow_1.11-py3         2020-05-03T08:35:15.533Z  python
    tensorflow_1.10-py3         tensorflow_1.10-py3         2020-05-03T08:35:15.494Z  python
    tensorflow_1.10-py2         tensorflow_1.10-py2         2020-05-03T08:35:15.467Z  python
    tensorflow_1.9-py3          tensorflow_1.9-py3          2020-05-03T08:35:15.435Z  python
    tensorflow_1.9-py2          tensorflow_1.9-py2          2020-05-03T08:35:15.409Z  python
    tensorflow_1.8-py3          tensorflow_1.8-py3          2020-05-03T08:35:15.383Z  python
    tensorflow_1.8-py2          tensorflow_1.8-py2          2020-05-03T08:35:15.356Z  python
    tensorflow_1.7-py3          tensorflow_1.7-py3          2020-05-03T08:35:15.326Z  python
    tensorflow_1.7-py2          tensorflow_1.7-py2          2020-05-03T08:35:15.297Z  python
    tensorflow_1.6-py3          tensorflow_1.6-py3          2020-05-03T08:35:15.270Z  python
    tensorflow_1.6-py2          tensorflow_1.6-py2          2020-05-03T08:35:15.243Z  python
    tensorflow_1.5-py2-ddl      tensorflow_1.5-py2-ddl      2020-05-03T08:35:15.209Z  python
    tensorflow_1.5-py3-horovod  tensorflow_1.5-py3-horovod  2020-05-03T08:35:15.181Z  python
    tensorflow_1.5-py3.6        tensorflow_1.5-py3.6        2020-05-03T08:35:15.142Z  python
    tensorflow_1.5-py3          tensorflow_1.5-py3          2020-05-03T08:35:15.109Z  python
    tensorflow_1.5-py2          tensorflow_1.5-py2          2020-05-03T08:35:15.079Z  python
    tensorflow_1.4-py2-ddl      tensorflow_1.4-py2-ddl      2020-05-03T08:35:15.048Z  python
    tensorflow_1.4-py3-horovod  tensorflow_1.4-py3-horovod  2020-05-03T08:35:15.019Z  python
    tensorflow_1.4-py3          tensorflow_1.4-py3          2020-05-03T08:35:14.987Z  python
    tensorflow_1.4-py2          tensorflow_1.4-py2          2020-05-03T08:35:14.945Z  python
    tensorflow_1.3-py2-ddl      tensorflow_1.3-py2-ddl      2020-05-03T08:35:14.886Z  python
    tensorflow_1.3-py3          tensorflow_1.3-py3          2020-05-03T08:35:14.856Z  python
    tensorflow_1.3-py2          tensorflow_1.3-py2          2020-05-03T08:35:14.829Z  python
    tensorflow_1.2-py3          tensorflow_1.2-py3          2020-05-03T08:35:14.799Z  python
    tensorflow_1.2-py2          tensorflow_1.2-py2          2020-05-03T08:35:14.771Z  python
    pytorch-onnx_1.2-py3.6      pytorch-onnx_1.2-py3.6      2020-05-03T08:35:14.742Z  python
    pytorch-onnx_1.1-py3.6      pytorch-onnx_1.1-py3.6      2020-05-03T08:35:14.712Z  python
    pytorch-onnx_1.0-py3        pytorch-onnx_1.0-py3        2020-05-03T08:35:14.682Z  python
    pytorch-onnx_1.2-py3.6-edt  pytorch-onnx_1.2-py3.6-edt  2020-05-03T08:35:14.650Z  python
    pytorch-onnx_1.1-py3.6-edt  pytorch-onnx_1.1-py3.6-edt  2020-05-03T08:35:14.619Z  python
    pytorch_1.1-py3.6           pytorch_1.1-py3.6           2020-05-03T08:35:14.590Z  python
    pytorch_1.1-py3             pytorch_1.1-py3             2020-05-03T08:35:14.556Z  python
    pytorch_1.0-py3             pytorch_1.0-py3             2020-05-03T08:35:14.525Z  python
    pytorch_1.0-py2             pytorch_1.0-py2             2020-05-03T08:35:14.495Z  python
    pytorch_0.4-py3-horovod     pytorch_0.4-py3-horovod     2020-05-03T08:35:14.470Z  python
    pytorch_0.4-py3             pytorch_0.4-py3             2020-05-03T08:35:14.434Z  python
    pytorch_0.4-py2             pytorch_0.4-py2             2020-05-03T08:35:14.405Z  python
    pytorch_0.3-py3             pytorch_0.3-py3             2020-05-03T08:35:14.375Z  python
    pytorch_0.3-py2             pytorch_0.3-py2             2020-05-03T08:35:14.349Z  python
    torch_lua52                 torch_lua52                 2020-05-03T08:35:14.322Z  lua
    torch_luajit                torch_luajit                2020-05-03T08:35:14.295Z  lua
    caffe-ibm_1.0-py3           caffe-ibm_1.0-py3           2020-05-03T08:35:14.265Z  python
    caffe-ibm_1.0-py2           caffe-ibm_1.0-py2           2020-05-03T08:35:14.235Z  python
    caffe_1.0-py3               caffe_1.0-py3               2020-05-03T08:35:14.210Z  python
    caffe_1.0-py2               caffe_1.0-py2               2020-05-03T08:35:14.180Z  python
    caffe_frcnn                 caffe_frcnn                 2020-05-03T08:35:14.147Z  Python
    caffe_1.0-ddl               caffe_1.0-ddl               2020-05-03T08:35:14.117Z  native
    caffe2_0.8                  caffe2_0.8                  2020-05-03T08:35:14.088Z  Python
    darknet_0                   darknet_0                   2020-05-03T08:35:14.059Z  native
    theano_1.0                  theano_1.0                  2020-05-03T08:35:14.032Z  Python
    mxnet_1.2-py2               mxnet_1.2-py2               2020-05-03T08:35:14.002Z  python
    mxnet_1.1-py2               mxnet_1.1-py2               2020-05-03T08:35:13.960Z  python
    --------------------------  --------------------------  ------------------------  --------

There is other meta information that can be included in meta_props, and it is generally recommended to add it as much as possible because it can record under what conditions the created model was created.

Meta information that can be described in meta_props
client.repository.ModelMetaNames.get()

output


['CUSTOM',
 'DESCRIPTION',
 'DOMAIN',
 'HYPER_PARAMETERS',
 'IMPORT',
 'INPUT_DATA_SCHEMA',
 'LABEL_FIELD',
 'METRICS',
 'MODEL_DEFINITION_UID',
 'NAME',
 'OUTPUT_DATA_SCHEMA',
 'PIPELINE_UID',
 'RUNTIME_UID',
 'SIZE',
 'SOFTWARE_SPEC_UID',
 'SPACE_UID',
 'TAGS',
 'TRAINING_DATA_REFERENCES',
 'TRAINING_LIB_UID',
 'TRANSFORMED_LABEL_FIELD',
 'TYPE']

Save model (deployment space)

Use the WML client to save the model in the deployment space. Alternatively, you can save the model to the analysis project by the above operation, and then click "Promote" of the model on the CP4D screen to copy and save the model of the analysis project to the deployment space. Become.

#Switch to deployment space (only if you need to switch)
client.set.default_space(space_id)

#Describe model meta information
model_name = "sample_iris_model"
meta_props={
    client.repository.ModelMetaNames.NAME: model_name,
    client.repository.ModelMetaNames.RUNTIME_UID: "scikit-learn_0.22-py3.6",
    client.repository.ModelMetaNames.TYPE: "scikit-learn_0.22",
    client.repository.ModelMetaNames.INPUT_DATA_SCHEMA:{
        "id":"iris model",
        "fields":[
            {'name': 'sepal length (cm)', 'type': 'double'},
            {'name': 'sepal width (cm)', 'type': 'double'},
            {'name': 'petal length (cm)', 'type': 'double'},
            {'name': 'petal width (cm)', 'type': 'double'}
        ]
    },
    client.repository.ModelMetaNames.OUTPUT_DATA_SCHEMA: {
        "id":"iris model",
        "fields": [
            {'name': 'iris_type', 'type': 'string','metadata': {'modeling_role': 'prediction'}}
        ]
    }
}

#Save the model. The return value contains the metadata of the created model
model_artifact = client.repository.store_model(model, meta_props=meta_props, training_data=X, training_target=y)

As a supplement, the meta information to be included in meta_props is the same as ["Supplement: Meta information to be included in the model"](#Supplement-Meta information to be included in the model), so please refer to that.

The ID of the saved model is contained in the return value model_artifact. You will need the ID when you create the deployment. Extract the ID as shown below.

Getting the ID from the return value


model_id = client.repository.get_model_uid(model_artifact)

Getting the ID from the return value (another method)


model_id = model_artifact['metadata']['guid']

List of models (analysis project)

#Switch to an analysis project (only if you need to switch)
client.set.default_project(project_id)

#Show list of models
client.repository.list_models()

List models (deployment space)

Use the WML client.

#Switch to deployment space (only if you need to switch)
client.set.default_space(space_id)

#Show list of models
client.repository.list_models()

Deployment operations (deployment space only)

Use the WML client. There are two types of deployment, Batch type and Online type. The ID of the model to be deployed is given to create to be created.

Creating a deployment (Online type)

Online type deployment


dep_name = "sample_iris_online"
meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: dep_name,
    client.deployments.ConfigurationMetaNames.ONLINE: {}
}

deployment_details = client.deployments.create(model_id, meta_props=meta_props)

Deployment takes less than 1 minute, but if you get the following output, the deployment is successful.

output


#######################################################################################

Synchronous deployment creation for uid: 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' started

#######################################################################################
    
initializing
ready
        
------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy'
------------------------------------------------------------------------------------------------

The ID of the created deployment can be retrieved from the return value as follows.

#ID of ONLINE type deployment
dep_id_online = deployment_details['metadata']['guid']

Create deployment (Batch type)

Batch type deployment


dep_name = "sample_iris_batch"
meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: dep_name,
    client.deployments.ConfigurationMetaNames.BATCH: {},
    client.deployments.ConfigurationMetaNames.COMPUTE: {
        "name": "S",
        "nodes": 1
    }
}

deployment_details = client.deployments.create(model_id, meta_props=meta_props)

If "Successfully" is displayed, the deployment is successful. The ID of the created deployment can be retrieved from the return value as follows.

#ID of BATCH type deployment
dep_id_batch = deployment_details['metadata']['guid']

List deployments

This also uses the WML client.

#View a list of deployments
client.deployments.list()

Execution of deployment (Online type)

In performing an online deployment, you create input data (JSON format) for scoring, throw it to the deployment in REST, and receive the prediction result. First, create sample input data.

Generate sample input data for scoring execution


# sample data for scoring (setosa)
scoring_x = pd.DataFrame(
    data = [[5.1,3.5,1.4,0.2]],
    columns=['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)']
)

values = scoring_x.values.tolist()
fields = scoring_x.columns.values.tolist()
scoring_payload = {client.deployments.ScoringMetaNames.INPUT_DATA: [{'fields': fields, 'values': values}]}
scoring_payload

output


{'input_data': [{'fields': ['sepal length (cm)',
    'sepal width (cm)',
    'petal length (cm)',
    'petal width (cm)'],
   'values': [[5.1, 3.5, 1.4, 0.2]]}]}

There are two ways to perform an Online deployment: WML client and requests.

Performing Online Scoring with WMLclient


prediction = client.deployments.score(dep_id_online, scoring_payload)
prediction

output


{'predictions': [{'fields': ['prediction', 'probability'],
   'values': [[0, [0.8131726303900102, 0.18682736960998966]]]}]}

An example of executing requests can be copied and pasted from the code snippet on the deployment details screen of the CP4D screen. mltoken is an API authentication token, `token obtained from the OS environment variable USER_ACCESS_TOKEN in [WML client initialization (authentication)](# WML-client initialization authentication) at the beginning of this article. You can use `as is. When running from an environment outside CP4D, [Getting a Bearer Token in the CP4D Product Manual](https://www.ibm.com/support/knowledgecenter/ja/SSQNUZ_2.5.0/wsj/analyze-data/ Execute ml-authentication-local.html) and obtain it in advance.

import urllib3, requests, json

# token = "XXXXXXXXXXXXXXXXXX"
# url = "https://cp4d.host.name.com"
header = {'Content-Type': 'application/json', 'Authorization': 'Bearer ' + token}
dep_url = url + "/v4/deployments/" + dep_id_online + "/predictions"

response = requests.post(dep_url, json=scoring_payload, headers=header)
prediction = json.loads(response.text)
prediction

output


{'predictions': [{'fields': ['prediction', 'probability'],
   'values': [['setosa', [0.9939393939393939, 0.006060606060606061, 0.0]]]}]}

If your CP4D domain uses a self-signed certificate and requests.post fails the certificate check, you can temporarily avoid it by using the options `` `verify = False``` in requests.post. Use at your own risk.

Execution of deployment (Batch type)

When executing Batch type deployment, the CSV file that is the input data is registered in the data asset of the deployment space in advance, and the href of the data asset is specified.

Preparation of input data


#CSV conversion of the first 5 lines of Iris training data X as a sample
X.head(5).to_csv("iris_test.csv")

#Switch to deployment space (only if you need to switch)
client.set.default_space(space_id)

#Registration to data assets
asset_details = client.data_assets.create(name="iris_test.csv",file_path="/home/wsuser/work/iris_test.csv")
asset_href = client.data_assets.get_href(asset_details)

Batch scoring execution


#Create meta information for execution jobs
job_payload_ref = {
    client.deployments.ScoringMetaNames.INPUT_DATA_REFERENCES: [{
        "location": {
            "href": asset_href
        },
        "type": "data_asset",
        "connection": {}
    }],
    client.deployments.ScoringMetaNames.OUTPUT_DATA_REFERENCE: {
        "location": {
            "name": "iris_test_out_{}.csv".format(dep_id_batch),
            "description": "testing csv file"
        },
        "type": "data_asset",
        "connection": {}
    }
}

#Batch execution(create_Will be executed when you job)
job = client.deployments.create_job(deployment_id=dep_id_batch, meta_props=job_payload_ref)
job_id = client.deployments.get_job_uid(job)

You can check the status of the execution result with the following code. If you want to embed it in your program, it's a good idea to loop until the state is complete.

#Check the status of batch execution jobs
client.deployments.get_job_status(job_id)

output


#If running
{'state': 'queued', 'running_at': '', 'completed_at': ''}

#When execution is completed
{'state': 'completed',
 'running_at': '2020-05-28T05:43:22.287357Z',
 'completed_at': '2020-05-28T05:43:22.315966Z'}

that's all. You can also save and deploy Python functions, but I'll add them or write them in another article if I get the chance.


(Added on June 1, 2020) The following Git repository has a sample notebook of models and deployments that can be used with CP4D v3.0. https://github.ibm.com/GREGORM/CPDv3DeployML

Recommended Posts

Cloud Pak for Data object operation example in Python (WML client, project_lib)
How to change python version of Notebook in Watson Studio (or Cloud Pak for Data)
Deploy functions with Cloud Pak for Data
Eliminate garbled Japanese characters in matplotlib graphs in Cloud Pak for Data Notebook
Display candlesticks for FX (forex) data in Python
Save pandas data in Excel format to data assets with Cloud Pak for Data (Watson Studio)
Create your own Big Data in Python for validation
Clean up the Cloud pak for Data deployment space
Generate Word Cloud from case law data in python3
Object oriented in python
Check the operation of Python for .NET in each environment
[Understand in the shortest time] Python basics for data analysis
Python code for writing CSV data to DSX object storage
Get data from analytics API with Google API Client for python
Handle Ambient data in Python
Python for Data Analysis Chapter 4
String object methods in Python
Display UTM-30LX data in Python
Null object comparison in Python
Search for strings in Python
Techniques for sorting in Python
Python for Data Analysis Chapter 2
About "for _ in range ():" in python
Python for Data Analysis Chapter 3
Install Networkx in Python 3.7 environment for use in malware data science books
Output log file with Job (Notebook) of Cloud Pak for Data
Try using FireBase Cloud Firestore in Python for the time being