[PYTHON] Verify NLC accuracy with Watson Studio's Jupyter Notebook

Introduction

Watson API's NLC (Natural Language Classifier) has been in service since the beginning of the Watson API service, and has been providing the service for a long time (although it has been about 3 or 4 years). ) API. Initially, it only had an API interface, but a beta version of the UI tool was provided in the middle, and a UI tool with a better UI was provided as a function of Watson Studio, and it continues to this day. The guide for using UI tools on Studio has already been introduced in another article Learning NLC with Watson Studio. This tool also has a simple test function, and tests can also be performed from the UI, but the problem in this case is that you can only perform one test at a time, and the target test data is sent at the same time. There is a point that it is not possible to obtain accuracy. I have created a simple tool to supplement this part, so I will introduce it.

Function introduction

First, let's take a quick look at what you can do with this tool. All the materials used to introduce here are uploaded on Github, so you can check the operation even in your environment. Unfortunately, NLC does not have a light plan, but Standard and the instance itself are free, and there is a range where you can check the operation for free below, so it should be possible for checking this sample within the free range.

Free 1 Natural Language Classifier per month
Free 1000 API calls per month
4 free training events per month

Learning

It is assumed that you have already learned. It doesn't matter if you use the Studio features or call the API directly. In this sample, we used the following text from Wikipedia to study three genres (classes): "Japanese history (j-hist)", "Japanese geography (j-geo)", and "science". It was.

スクリーンショット 2019-11-04 10.47.02.png

Validation data

The verification data uses the following execl file.

スクリーンショット 2019-11-04 10.34.57.png

The first row of EXCEL means the column name. The item with the column name ** text ** means the text data used when calling NLC. ** class ** is the correct class name corresponding to that Text. Other items are not used when conducting the test, so they may or may not be present. In this sample Excel, there is a key item ** text_id ** for convenience, but it does not matter whether such a column exists or not.

output data

The output data created as a result of the test is also Excel. The sample is shown below. (The appearance will be crafted later to make it easier to see. The actual output is more plain.)

スクリーンショット 2019-11-04 10.34.08.png

The first three columns are the test input data itself. After that, the result of applying Text to the model, The output is in the order of ** class name **, ** confidence **. The number of items to be output can be set with variables on the notebook. In this example, n = 3.

Confusion matrix

The output as Excel is as above, but as a bonus function, the following confusion matrix (Confusion Matirx) is also displayed on Notebook. Now you can also see what class is accurate and how much.

スクリーンショット 2019-11-04 10.58.30.png

environment

I will explain the environment required to run the sample.

Cloud environment

You must have an account on IBM Cloud and have NLC and Watson Studio available as services. Unfortunately, for NLC, you need a credit card account. Please refer to the link below for the specific procedure.

From IBM cloud user registration to Jupyter Notebook usage

Easy Jupyter Notebook in the Cloud-Procedures for using Jupyter Notebook in IBM Cloud-

Upgrade to a credit card account

Procedure for upgrading to IBM Cloud (formerly Bluemix) credit account

Additional service registration procedure

The following article describes the procedure for associating Spark / Watson ML services with Studio, but you can associate NLC with the same procedure. (The service group is "** AI **" like Watson ML)

Register additional services in Watson Studio

Necessary materials

The necessary materials to move the sample are as follows.

file name Purpose Link
nlc-test-tool-v1.ipynb Accuracy verification script body For downloadForcodeconfirmation
nlc-test-sample.xlsx Excel sample for verification For download
nlc-test-sample-output.xlsx Verification result Excel sample For download
nlc-train.csv CSV sample for learning For downloadFordataconfirmation

procedure

Now, I will explain the procedure for actually running the sample application.

Learning NLC model

In this sample application, the model trained using the above nlc-train.csv as training data is used. The learning procedure using Watson Studio is explained in another article Learning NLC with Watson Studio, so please refer to this. After learning, check the Model ID of the model you created on the asset management screen of Watson Studio. Copy it and save it with a text editor, etc., as you will use it later.

スクリーンショット 2019-11-04 11.59.31.png

Check NLC credentials

https://cloud.ibm.com/services
From, view the list of IBM Cloud services and click the NLC link.

スクリーンショット 2019-11-04 12.08.10.png

The screen will look like the one below. ① Click ** Service Credentials ** ② Click ** Display credential information ** ③ Click the ** clipboard icon ** to copy the credentials In order.

スクリーンショット 2019-11-04 12.11.34.png

Paste the clipboard credentials into a text editor and save them. We will use this information later.

Excel upload for tools

Upload Excel nlc-test-sample.xlsx to be used for verification on the cloud. The procedure is as follows:

スクリーンショット 2019-11-04 12.22.44.png

スクリーンショット 2019-11-04 12.22.51.png

スクリーンショット 2019-11-04 12.22.59.png

If the upload is successful, you should see nlc-test-sample.xlsx in ** Data Assets **, as shown in the following figure.

スクリーンショット 2019-11-04 12.33.07.png

Importing Notebook into Watson Studio

Load the pre-downloaded nlc-test-tool-v1.ipynb into Watson Studio's Jupyter Notebook. For the loading procedure, refer to Easy Jupyter Notebook in the Cloud-Procedures for using Jupyter Notebook in IBM Cloud-.

Setting COS credentials

The loaded Jupyter Notebook needs to be modified in several places according to the environment. First, set up your COS credentials. The Notebook immediately after loading should look like the figure below, so click the "** + **" icon at the top of the screen to insert a cell.

スクリーンショット 2019-11-04 12.40.55.png

In the state of the figure below

① Click the ** file ** icon at the top of the screen ② From the file list, click the nlc-test-sample.xlsx uploaded earlier. ③ Click ** Insert Credenstails ** from the menu that appears

スクリーンショット 2019-11-04 12.40.11.png

The contents of the empty cell inserted earlier should be as shown below, so copy ** IAM_SRVICE_ID ** to ** BUCKET ** to the clipboard.

スクリーンショット 2019-11-04 12.53.10.png

Paste the copied information into the "** COS Credentials **" cell below it. The item of the original dummy data is deleted. In the end, it should look like this: (Note that only the bottom FILE: infile line will use the original information.)

スクリーンショット 2019-11-04 12.53.40.png

After completing a series of work, delete the cells added for work together with the cells. (Click the scissors icon with the cell you want to erase selected)

model_id setting

Set the following model_id. Paste the prepared model_id in the model_id row of the "** Variable Definition **" cell. Enclose the string in single quotes.

スクリーンショット 2019-11-04 13.03.56.png

Setting NLC credentials

Finally, set the NLC credentials. Paste the pre-prepared NLC credentials into the "** NLC Credentials **" cell. Delete the extra parentheses and the first dummy item line to make it look like the figure below.

スクリーンショット 2019-11-04 13.07.48.png

Run Notebook

Thank you for your hard work. This completes all preparatory work. In ** Jupyter Notebook **, the cursor is aligned with the cell you want to process, and the corresponding cell is executed by pressing SHift + Enter keys, and the selected cell is advanced by one. If you return the selected cell to the first cell and repeat SHift + Enter, it should evaluate the model with test data, generate Excel, display the confusion matrix, and so on. If an error occurs in any cell, see the error message to determine the problem.

If there is no error display and the execution result is as shown in the figure below, the tool has been executed successfully.

スクリーンショット 2019-11-04 13.15.41.png

Get output EXCEL

The generated Excel is made up of the above Notebook up to the point of writing it back to COS, but downloading this Excel from the Studio screen requires another effort. I will explain the procedure. (If you create an output EXCEL with the same file name, the procedure will be unnecessary from the second time onward)

First, select "** Add to project "-> " Data **" from the project management screen. (Procedure explained in Excel file upload)

If you set the tab at the top of the screen to "File", the newly generated nlc-test-sample-output.xlsx file should be included in the list. (If it doesn't appear in the list, try closing and reopening the Porject.)

スクリーンショット 2019-11-04 13.29.40.png

On the screen below

① Click the check box of this file ② Click the icon with the dots lined up vertically ③ Select "** Add as data ase et **" from the menu

will do.

スクリーンショット 2019-11-04 13.29.52.png

Then, as shown in the screen below, the output EXCEL will also be displayed in the ** Data asset ** field. Click the icon under Actions and select ** Download ** from the menu to download the Excel file for output.

スクリーンショット 2019-11-04 13.23.28.png

The raw Excel before processing looks like this.

スクリーンショット 2019-11-04 13.24.09.png

bonus

Before I knew it, it seems that EXCEL, which has only one sheet, can be read as Data Asset of Studio. Attach the result of reading Excel registered on the Data Asset side from Studio in the above procedure.

スクリーンショット 2019-11-05 8.11.52.png

スクリーンショット 2019-11-05 8.11.36.png

Recommended Posts

Verify NLC accuracy with Watson Studio's Jupyter Notebook
Using Graphviz with Jupyter Notebook
Use pip with Jupyter Notebook
Use Cython with Jupyter Notebook
Play with Jupyter Notebook (IPython Notebook)
Allow external connections with jupyter notebook
Formatting with autopep8 on Jupyter notebook
Visualize decision trees with jupyter notebook
Make a sound with Jupyter notebook
Use markdown with jupyter notebook (with shortcut)
Add more kernels with Jupyter Notebook
Convenient analysis with Pandas + Jupyter notebook
Use nb extensions with Anaconda's Jupyter notebook
Use apache Spark with jupyter notebook (IPython notebook)
I want to blog with Jupyter Notebook
Use Jupyter Lab and Jupyter Notebook with EC2
Try SVM with scikit-learn on Jupyter Notebook
How to use jupyter notebook with ABCI
Linking python and JavaScript with jupyter notebook
[Jupyter Notebook memo] Display kanji with matplotlib
Rich cell output with Jupyter Notebook (IPython)
How to debug with Jupyter or iPython Notebook
When Html cannot be output with Jupyter Notebook
Analytical environment construction with Docker (jupyter notebook + PostgreSQL)
Data analysis for improving POG 2 ~ Analysis with jupyter notebook ~
Enable Jupyter Notebook with conda on remote server
Try using conda virtual environment with Jupyter Notebook
Fill the browser with the width of Jupyter Notebook
Graph drawing with jupyter (ipython notebook) + matplotlib + vagrant
Jupyter Notebook memo
Introducing Jupyter Notebook
Powerful Jupyter Notebook
Jupyter notebook password
Jupyter Notebook memo
Virtual environment construction with Docker + Flask (Python) + Jupyter notebook
Easy Machine Learning with AutoAI (Part 4) Jupyter Notebook Edition
Monitor the training model with TensorBord on Jupyter Notebook
Drawing a tree structure with D3.js in Jupyter Notebook
Import specific cells from other notebooks with Jupyter notebook
EC2 provisioning with Vagrant + Jupyter (IPython Notebook) on Docker