Use Azure ML Python SDK 4: Write output to Blob storage-Part 2

Contents of this time

The content of this time is almost the same as the previous Using Azure ML Python SDK 3: Writing the output to Blob storage-Part 1, and the input is The difference is that it becomes a folder instead of a file. That's not interesting, so I'll try to attach a sample that actually passes some packages to the container.

Appearance item

Since this time we specify a folder instead of a specific file as input, it is assumed that there are multiple files under work2 / input / in the figure below. Other than that, you still have remote virtual machines and Jupyter Notebooks.

--Remote virtual machine (hereinafter "computing cluster" using Azure ML terminology)

Azureml2-2.png

To check the Python SDK version


import azureml.core
print("SDK version:", azureml.core.VERSION)

The folder structure of the Notebook remains the same.

Azureml4-2.png

script2.2.py collects the file names of the files saved in work2 / input / into a CSV file and saves it in work2 / output / output1 /. The purpose of setting the subfolder output1 is to use it as a sample folder creation with script2.2.py.

The procedure for HelloWorld2.2.ipynb is the same as last time and is as follows. Azureml5-2.png

procedure

We will continue to follow the steps as before.

  1. Load the package
    Load the package.

    
    from azureml.core import Workspace, Experiment, Dataset, Datastore, ScriptRunConfig, Environment
    from azureml.data import OutputFileDatasetConfig
    from azureml.core.compute import ComputeTarget
    from azureml.core.compute_target import ComputeTargetException
    from azureml.core.conda_dependencies import CondaDependencies
    
    workspace = Workspace.from_config()
    
  2. Specifying a computing cluster
    Specify the compute cluster.

    
    aml_compute_target = "demo-cpucluster1"  # <== The name of the cluster being used
    try:
        aml_compute = ComputeTarget(workspace, aml_compute_target)
        print("found existing compute target.")
    except ComputeTargetException:
        print("no compute target with the specified name found")
    
  3. Specify input and output folders
    demostore is the datastore name registered in Azure ML Workspace. I'm passing the file path inside the BLOB container of the datastore to the dataset class.

    ds = Datastore(workspace, 'demostore')
    input_data = Dataset.File.from_files(ds.path('work2/input/')).as_named_input('input_ds').as_mount()
    
    output = OutputFileDatasetConfig(destination=(ds, 'work2/output')) 
    
  4. Specifying the container environment
    As mentioned at the beginning, this time I actually specified some packages. It is only described as a sample of the specification method, and has no particular meaning.

    myenv = Environment("myenv")
    
    myenv.docker.enabled = True
    myenv.python.conda_dependencies = CondaDependencies.create(pip_packages=[
        'azureml-defaults',
        'opencv-python-headless',
        'numpy',
        'pandas',
        'tensorflow',
        'matplotlib',
        'Pillow'
    ])
    
  5. Specifying the executable file name
    Specify the folder name containing the set of scripts to be executed remotely in source_directory. Also, in script, specify the name of the script file that will be the entry for remote execution. In remote execution, all files and subdirectories in source_directory are passed to the container, so be careful not to place unnecessary files. Pass input_data with the argument name datadir and output with the argument name output. We also specify the compute cluster name in compute_target and pass in myenv, which instantiates the Environment in environment.

    
    src = ScriptRunConfig(source_directory='script_folder2', 
                          script='script2.2.py', 
                          arguments =['--datadir', input_data, '--output', output],
                          compute_target=aml_compute,
                          environment=myenv)
    
  6. Run the experiment
    Run the script.

    
    exp = Experiment(workspace, 'work-test')
    run = exp.submit(config=src)
    

This cell ends asynchronously, so if you want to wait for the end of execution, execute the following statement.

```python

%%time
run.wait_for_completion(show_output=True)
```
  1. script2.2.py
    The contents of the script that is executed remotely. As mentioned above, combine the file names of the files saved in work2 / input / into a data frame and save them in work2 / output / output1 / as outfile.csv. output1 / is created in this script.

        
    import argparse
    import os
    import cv2
    import numpy as np
    import pandas as pd
    import math
    import tensorflow as tf
    import PIL
    import matplotlib
    
    parser = argparse.ArgumentParser()
    parser.add_argument('--datadir', type=str, help="data directory")
    parser.add_argument('--output', type=str, help="output")
    args = parser.parse_args()
    
    print("Argument 1: %s" % args.datadir)
    print("Argument 2: %s" % args.output)
    
    print("cv2: %s" % cv2.__version__)
    print("numpy: %s" % np.__version__)
    print("pandas: %s" % pd.__version__)
    print("tensorflow: %s" % tf.__version__)
    print("matplotlib: %s" % matplotlib.__version__)
    print("PIL: %s" % PIL.PILLOW_VERSION)
    
    file_dict = {}
    file_dict_df = pd.DataFrame([])
    i = 0
    
    for fname in next(os.walk(args.datadir))[2]:
        print('processing', fname)
        i += 1
        infname = os.path.join(args.datadir, fname)
    
        file_dict['num'] = i
        file_dict['file name'] = fname
        file_dict_df = file_dict_df.append([file_dict])
    
    os.makedirs(args.output + '/output1', exist_ok=True)
    outfname = os.path.join(args.output, 'output1/outfile.csv')
    file_dict_df.to_csv(outfname, index=False, encoding='shift-JIS')
    

in conclusion

What do you think. You can understand the basic operation of Azure ML Python SDK in 1-4 using Azure ML Python SDK. Next time, I would like to introduce you to the pipeline.

Reference material

Use Azure ML Python SDK 1: Use dataset as input-Part 1 [Using Azure ML Python SDK 2: Using dataset as input-Part 2] (https://qiita.com/notanaha/items/30d57590c92b03bc953c) [Using Azure ML Python SDK 3: Using dataset as input-Part 1] (https://qiita.com/notanaha/items/d22ba02b9cc903d281b6) Azure/MachineLearningNotebooks

Recommended Posts

Use Azure ML Python SDK 4: Write output to Blob storage-Part 2
Use Azure ML Python SDK 3: Write output to Blob storage-Part 1
Use Azure Blob Storage from Python
Using Azure ML Python SDK 5: Pipeline Basics
Check! How to use Azure Key Vault with Azure SDK for Python! (Measures around authentication)
python3: How to use bottle (2)
[Python] How to use list 1
[Hyperledger Iroha] Notes on how to use the Python SDK
How to use Python argparse
How to specify Cache-Control for blob storage in Azure Storage in Python
Python: How to use pydub
[Python] How to use checkio
Write to csv with Python
[Python] Use this to read and write wav files [wavio]
[Python] How to use input ()
How to use Python lambda
[Python] How to use virtualenv
python3: How to use bottle (3)
Let's use Watson from Python! --How to use Developer Cloud Python SDK
python3: How to use bottle
How to use Python bytes
How to write a Python class
Python: How to use async with
[Python] Write to csv file with Python
[Python] How to use Pandas Series
How to use Requests (Python Library)
How to use SQLite in Python
Output to csv file with Python
[Introduction to Python] Let's use pandas
[Python] How to use list 3 Added
How to use Mysql in python
How to use OpenPose's Python API
How to use ChemSpider in Python
How to use FTP with Python
Python: How to use pydub (playback)
How to use PubChem in Python
Easy to use Jupyter notebook (Python3.5)
[Introduction to Python] Let's use pandas
How to use python zip function
[Introduction to Python] Let's use pandas
[Python] How to use Typetalk API
File upload to Azure Storage (Python)
Tips for Python beginners to use Scikit-image examples for themselves 3 Write to a file
[Python] Summary of how to use pandas
[Introduction to Python] How to use class in Python?
How to use the Spark ML pipeline
[Python] Use pandas to extract △△ that maximizes ○○
How to install and use pandas_datareader [Python]
Output color characters to pretty with python
How to write Python document comments (Docstrings)
Output Python log to console with GAE
I want to use jar from python
[Introduction to Python] Let's use foreach with Python
[Python] How to use import sys sys.argv
Easy way to use Wikipedia in Python
[Python] Organizing how to use for statements
Memorandum on how to use gremlin python
[Python2.7] Summary of how to use unittest
python: How to use locals () and globals ()
How to use __slots__ in Python class
Install python on xserver to use pip