Use Azure ML Python SDK 3: Write output to Blob storage-Part 1

Contents of this time

Use Azure ML Python SDK: Using dataset as input-Part 1 and [Use Azure ML Python SDK: Use dataset as input-Part 2] (https://qiita.com/notanaha/items/30d57590c92b03bc953c) Then, I described the handling of input data. I used the outputs folder provided by default for the output, but this time I will write it to any blob storage.

Appearance item

The items that will appear this time are

--CSV file (assuming that it is located in Azure Blob Storage for operation) --In the figure below, only HelloWorld.txt in the work folder is prepared at the beginning, and output.csv of work-out is the file that is copied as a result of executing this script. --Remote virtual machine (hereinafter "computing cluster" using Azure ML terminology)

Azureml1-3.png

To check the Python SDK version


import azureml.core
print("SDK version:", azureml.core.VERSION)

The folder structure of the Notebook is described on the assumption that it is as follows.

Azureml2-3.png

As before, script2.py is as simple as reading a CSV file on blob storage and writing it to the work-out folder. Similarly, it is the role of HelloWorld2.0.ipynb to send script2.py to the compute cluster for execution.

The procedure for HelloWorld2.0.ipynb is as follows. The output folder is specified in ③. Azureml3-3.png

procedure

Let's take a look at the steps in order.

  1. Load the package
    First, load the package.

    
    from azureml.core import Workspace, Experiment, Dataset, Datastore, ScriptRunConfig, Environment
    from azureml.data import OutputFileDatasetConfig
    from azureml.core.compute import ComputeTarget
    from azureml.core.compute_target import ComputeTargetException
    from azureml.core.conda_dependencies import CondaDependencies
    
    workspace = Workspace.from_config()
    
  2. Specifying a computing cluster
    You can also create remote compute resources with the Python SDK, but here I created a compute cluster in Azure ML Studio in advance for a better overall view.

    
    aml_compute_target = "demo-cpucluster"  # <== The name of the cluster being used
    try:
        aml_compute = ComputeTarget(workspace, aml_compute_target)
        print("found existing compute target.")
    except ComputeTargetException:
        print("no compute target with the specified name found")
    
  3. Specify the CSV file path and output folder
    demostore is the datastore name registered in Azure ML Workspace. I'm passing the file path inside the BLOB container of the datastore to the dataset class. Unlike the last time, it is passed with the file name File.from_files (). Tabular.from_delimited_files () is used to pass tabular data such as csv files, but File.from_files () can be used to pass other files and folders.

    
    ds = Datastore(workspace, 'demostore')
    input_data = Dataset.File.from_files(ds.path('work/HelloWorld.txt')).as_named_input('input_ds').as_mount()
    
    output = OutputFileDatasetConfig(destination=(ds, 'work_out'))
    
  4. Specifying the container environment
    As mentioned above, this time we will use Environment () instead of RunConfiguration (). In the former, the variables of the compute cluster were specified here, but in the latter, they are not specified, and the variables of the compute cluster will be specified in the following ScriptRunConfig (). Only pip_package is used here, but conda_package can be specified in the same way as RunConfiguration ().

    myenv = Environment("myenv")
    
    myenv.docker.enabled = True
    myenv.python.conda_dependencies = CondaDependencies.create(pip_packages=['azureml-defaults'])
    
  5. Specifying the executable file name
    Specify the folder name containing the set of scripts to be executed remotely in source_directory. Also, in script, specify the name of the script file that will be the entry for remote execution. In remote execution, all files and subdirectories in source_directory are passed to the container, so be careful not to place unnecessary files. Up to the last time, we introduced the method specific to Azure ML Python SDK for data set passing, but this time we will describe it according to the method of receiving the argument specified by arguments with argparse. Pass input_data with the argument name datadir and output with the argument name output. We also specify the compute cluster name in compute_target and pass in myenv, which instantiates the Environment in environment.

    
    src = ScriptRunConfig(source_directory='script_folder2', 
                          script='script2.py', 
                          arguments =['--datadir', input_data, '--output', output],
                          compute_target=aml_compute,
                          environment=myenv)
    
  6. Run the experiment
    Run the script.

    
    exp = Experiment(workspace, 'InOutSample')
    run = exp.submit(config=src)
    

This cell ends asynchronously, so if you want to wait for the end of execution, execute the following statement.

```python

%%time
run.wait_for_completion(show_output=True)
```
  1. script2.py
    The contents of the script that is executed remotely. You can parse the datadir and output arguments using a parser. The full path to the input file is passed to args.datadir. On the other hand, args.output only passes to the folder name, so os.path.join is used to specify the file name output.csv here.

        
    import argparse
    import os
    
    print("*********************************************************")
    print("*************          Hello World!         *************")
    print("*********************************************************")
    
    parser = argparse.ArgumentParser()
    parser.add_argument('--datadir', type=str, help="data directory")
    parser.add_argument('--output', type=str, help="output")
    args = parser.parse_args()
    
    print("Argument 1: %s" % args.datadir)
    print("Argument 2: %s" % args.output)
    
    with open(args.datadir, 'r') as f:
        content = f.read()
        with open(os.path.join(args.output, 'output.csv'), 'w') as fw:
            fw.write(content)
    

in conclusion

What do you think. This time I showed you how to write the output to any blob storage. Also, File.from_files () was used to specify the input file, and Environment () was used to specify the container environment, which was different from the previous time. Next time, I will introduce a variation that specifies a folder with File.from_files ().

Reference material

Azure/MachineLearningNotebooks: scriptrun-with-data-input-output Use Azure ML Python SDK 1: Use dataset as input-Part 1 [Using Azure ML Python SDK 2: Using dataset as input-Part 2] (https://qiita.com/notanaha/items/30d57590c92b03bc953c) [Use Azure ML Python SDK 4: Write output to Blob storage-Part 2] (https://qiita.com/notanaha/items/655290670a83f2a00fdc) azureml.data.OutputFileDatasetConfig class - Microsoft Docs azureml.core.Environment class - Microsoft Docs

Recommended Posts

Use Azure ML Python SDK 4: Write output to Blob storage-Part 2
Use Azure ML Python SDK 3: Write output to Blob storage-Part 1
Use Azure ML Python SDK 2: Use dataset as input-Part 2
Use Azure Blob Storage from Python
Using Azure ML Python SDK 5: Pipeline Basics
Check! How to use Azure Key Vault with Azure SDK for Python! (Measures around authentication)
python3: How to use bottle (2)
[Python] How to use list 1
How to use Python argparse
How to specify Cache-Control for blob storage in Azure Storage in Python
Python: How to use pydub
Write to csv with Python
[Python] Use this to read and write wav files [wavio]
[Python] How to use input ()
How to use Python lambda
[Python] How to use virtualenv
python3: How to use bottle (3)
Let's use Watson from Python! --How to use Developer Cloud Python SDK
python3: How to use bottle
How to use Python bytes
How to change the log level of Azure SDK for Python
How to write a Python class
Python: How to use async with
[Python] Write to csv file with Python
[Python] How to use Pandas Series
How to use Requests (Python Library)
How to use SQLite in Python
Write standard output to a file
[Introduction to Python] Let's use pandas
[Python] How to use list 3 Added
How to use Mysql in python
How to use OpenPose's Python API
How to use ChemSpider in Python
How to use FTP with Python
Python: How to use pydub (playback)
How to use PubChem in Python
Easy to use Jupyter notebook (Python3.5)
[Introduction to Python] Let's use pandas
How to use python zip function
[Introduction to Python] Let's use pandas
[Python] How to use Typetalk API
File upload to Azure Storage (Python)
Tips for Python beginners to use Scikit-image examples for themselves 3 Write to a file
[Introduction to Python] How to use class in Python?
[Python] Use pandas to extract △△ that maximizes ○○
[Road to intermediate Python] Use ternary operators
How to install and use pandas_datareader [Python]
Output color characters to pretty with python
How to write Python document comments (Docstrings)
Output Python log to console with GAE
[python] How to use __command__, function explanation
I want to use jar from python
[Introduction to Python] Let's use foreach with Python
Easy way to use Wikipedia in Python
[Python] Organizing how to use for statements
Memorandum on how to use gremlin python
[Python2.7] Summary of how to use unittest
python: How to use locals () and globals ()
How to use __slots__ in Python class
Install python on xserver to use pip
Python> Output numbers from 1 to 100, 501 to 600> For csv