Reading CSV data from DSX object storage Python code

A reminder about how to read files on object storage from Python code on a Notebook in one of Watson Data Platform's core services, Data Science Experience (https://datascience.ibm.com/) (DSX). It is a record.

The writing code is for reading data on Bluemix PaaS object storage from a Python program on DSX Notebooks, Please be careful not to make a mistake because the object storage is the object storage on the service of Bluemix PaaS, not the IBM Cloud Onject Storage of the Bluemix infrastructure. This code was written with the concept of temporarily storing data sent from a server or IoT device in the object storage of Bluemix PaaS, reading it from the Python code of DSX, and executing scientific calculation processing.

スクリーンショット 2017-04-20 12.04.03.png Figure 1 This Python code that works with the Concept Data Science Experience Notebook

How to use

To use it, log in to DSX and select Project-> Default Project (any project name)-> Add Notebooks to create one notebook. Then copy and paste the code below to change the object storage credentials to your own credentials and you're ready to go. The method of acquiring authentication information will be described later.

You can take a file from object storage and populate it in a variable in your Python code with the following method: It is convenient to store the captured data in an array of Num.py. Please note that the first argument is the authentication information, and the user ID and password are changed for each container. The second argument is the object (file) name. Since the container name is set in the credentials, it is not explicitly set here.

result,status,Label,Data = Read_CSV_from_ObjectStorage(credentials_1, filename)

The first return value result returns True on success and False on failure. The second return value, stauts, contains the HTTP code. 200 is set for success. If the authentication fails, an error code in the 400s will be set. The third return value Label returns a list of item name labels in the header line of the CSV file. The fourth return value Data is the content of the data. All the data is converted to Float type and returned in the array.

The following is the whole reading code. Copy the following code to the DSX Notebook and change the necessary parts to use it. The number of columns in the CSV format file is programmed to automatically correspond.


%matplotlib inline
from io import BytesIO  
import requests  
import numpy as np
import matplotlib.pyplot as plt
import json

# Object storage authentication information <-Replace by learning the authentication information acquisition method described later.
credentials_1 = {
 'auth_url':'https://identity.open.softlayer.com',
 'project':'object_storage_bc6cdc85_586e_4581_8a09_8f01f7bdf3ed',
 'project_id':'2a9de4c1d50944a49f1a46dd53394158',
 'region':'dallas',
 'user_id':'********************************',
 'domain_id':'fb119f3e1bc0469dad2b253b317ec7ea',
 'domain_name':'952993',
 'username':'***********************************************',
 'password':"********************",
 'container':'DefaultProjecttakarajpibmcom',
 'tenantId':'undefined',
 'filename':'testdata_for_dsx.csv'
}

# Read from object storage
def Read_CSV_from_ObjectStorage(credentials, fileName):  
   """This functions returns a StringIO object containing
   the file content from Bluemix Object Storage V3."""

   url1 = ''.join(['https://identity.open.softlayer.com', '/v3/auth/tokens'])
   data = {'auth': {'identity': {'methods': ['password'],
           'password': {'user': {'name': credentials['username'],'domain': {'id': credentials['domain_id']},
           'password': credentials['password']}}}}}
   headers1 = {'Content-Type': 'application/json'}
   resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
   
#Exit when authentication error occurs
   if resp1.status_code != 201:
       return False, resp1.status_code, None, None
   
   resp1_body = resp1.json()
   for e1 in resp1_body['token']['catalog']:
       if(e1['type']=='object-store'):
           for e2 in e1['endpoints']:
               if(e2['interface']=='public'and e2['region']=='dallas'):
                   #url2 = ''.join([e2['url'],'/', credentials['container'], '/', credentials['filename']])
                   url2 = ''.join([e2['url'],'/', credentials['container'], '/', fileName])
   s_subject_token = resp1.headers['x-subject-token']
   headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'text/csv'}
   resp2 = requests.get(url=url2, headers=headers2)
   if resp2.status_code != 200:
       return False, resp2.status_code, None, None
   
#Set in an array
tempArray = resp2.text.split ("\ n") #split into lines
csvLabel = [] # Label in first line of CSV
csvFloat = [] #Data part of CSV after the second line
lineNo = 0 # line count

   for row in tempArray:
       if len(row) > 0:
           c = row.split(",")
           if lineNo == 0:
               csvLabel = c
           else:
               a = []
               for i in range(0,len(c)):
                   a.append(float(c[i]))
               csvFloat.append(a)                    
       lineNo = lineNo + 1
   return True, resp2.status_code,csvLabel,csvFloat


# Sample main
filename ='testDataSet.csv' <-Set the object name of the CSV file you want to read

result,status,Label,Data = Read_CSV_from_ObjectStorage(credentials_1, filename)
if result == True:
a = np.array (Data) # numpy 2D array (depending on the number of CSV columns)

# Graph drawing
x = np.array (a [:, [0]]) # Extract the first column
y = np.array (a [:, [1]]) # Extract the second column
   plt.plot(x,y)
   plt.show()
   
else:    
   print "ERROR ", status

How to get object storage credentials

CSV file registration

First, register the CSV file in the DSX object storage. Note that there is a one-to-one correspondence between DSX projects and object storage containers. Therefore, please note that you cannot access the container of other projects from Notebook. Therefore, register the CSV file in the container associated with the project you are currently using. Specify Project-> Project Name on the menu bar to open the screen where the risks of Notebooks and Data Assets are displayed. Then, click + Add Data Assets in Data Assets, and the following will be displayed at the right end. If you drag and drop the file into the area of the dashed line displayed as Drop file here, the file will be uploaded. Then check the checkbox in front of the file name. It should now appear in Data Assets. スクリーンショット 2017-04-20 13.08.13.png

Get credentials

Then create a Notebook or open the developing Notebook in edit mode. Click the pen mark icon to open it in edit mode. AndScreenshot 2017-04-20 13.13.27.png By clicking the icon of, the following display will appear, so if you click the downward triangle, a further menu will appear. スクリーンショット 2017-04-20 13.14.01.png

Click Insert Credentials at the bottom of this list to insert your credentials into your Notebook. Edit and ready. スクリーンショット 2017-04-20 13.14.11.png

Code execution result

In this code, the data is read and the graph is displayed. The graph corresponding to the data in the CSV file is displayed.

スクリーンショット 2017-04-20 13.19.48.png

Reference information

The information underlying this code is Working with Object Storage in Data Science Experience --Python Edition]( https://datascience.ibm.com/blog/working-with-object-storage-in-data-science-experience-python-edition/)の記事のJSON読み込み用コードを元に書き換えた物です。

Recommended Posts

Reading CSV data from DSX object storage Python code
Python code for writing CSV data to DSX object storage
Python: Reading JSON data from web API
Operate Sakura's cloud object storage from Python
[Python] Reading CSV files
[Python] Django Source Code Reading View Starting from Zero ①
Stop Omxplayer from Python code
Notes on importing data from MySQL or CSV with Python
[Python / Ruby] Understanding with code How to get data from online and write it to CSV
Python: Exclude tags from html data
Touch a Python object from Elixir
Hit treasure data from Python Pandas
Get data from Quandl in Python
Using Cloud Storage from Python3 (Introduction)
Python CSV file reading and writing
Execute Python code from C # GUI
Use Azure Blob Storage from Python
Reading and writing CSV with Python
[Data science basics] I tried saving from csv to mysql with python
Python CSV file Character code conversion, file name extraction, reading, output, merging operation
Code reading of faker, a library that generates test data in Python
Copy data from Amazon S3 to Google Cloud Storage with Python (boto)
Try to extract specific data from JSON format data in object storage Cloudian/S3
Receive textual data from mysql with python
SIGNATE Quest ① From data reading to preprocessing
[Note] Get data from PostgreSQL with Python
Data input / output in Python (CSV, JSON)
Use PostgreSQL data type (jsonb) from Python
[Python3] Rewrite the code object of the function
[Python] Web application from 0! Hands-on (4) -Data molding-
Python> Output numbers from 1 to 100, 501 to 600> For csv
[Note] Execute Python code from Excel (xlwings)
[Python] From morphological analysis of CSV data to CSV output and graph display [GiNZA]
(Miscellaneous notes) Data update pattern from CSV data acquisition / processing to Excel by Python
Character code for reading and writing csv files with python ~ windows environment ver ~
Get time series data from k-db.com in Python
Csv output from Google search with [Python]! 【Easy】
Firebase: Use Cloud Firestore and Cloud Storage from Python
[Kaggle] From data reading to preprocessing and encoding
Read Python csv data with Pandas ⇒ Graph with Matplotlib
[Python] Convert from DICOM to PNG or CSV
Study from Python Reading and writing Hour9 files
[Python] How to read data from CIFAR-10 and CIFAR-100
How to generate a Python object from JSON
Reading and writing CSV and JSON files in Python
Generate an insert statement from CSV with Python.
I tried reading a CSV file using Python
[Python] Flow from web scraping to data analysis
The story of reading HSPICE data in Python
Create a datetime object from a string in Python (Python 3.3)
Write CSV data to AWS-S3 with AWS-Lambda + Python
I want to make C ++ code from Python code!
Make JSON into CSV with Python from Splunk
Generate and output plantuml object diagram from Python object
Example of reading and writing CSV with Python
Extract data from a web page with Python