I tried to summarize useful knowledge when developing / operating machine learning services [Python x Azure]

Introduction

Hello, I'm active with a creator-named "umi-mori", is MORISHIGE. The targets and overview of this article are as follows:

--Target / Target --Researchers involved in machine learning --People engaged in software using machine learning (engineers / designers / project managers / marketers / sales / consultants / managers, etc.) --Overview / Abstract --Introduction of knowledge that you need to know when executing / operating machine learning on a cloud service called Azure.

Also, although the author of this article (I) is qualified for Azure Fundamental (Certificate), there are still many things that I have not learned enough. Therefore, if you have any knowledge such as "There is such a service!" Or "There are also such advantages and disadvantages!", I would appreciate it if you could share it in the comments.

Reasons for Python x Azure

In recent years, the number of technology options has increased, so it has become important for research and development to select technologies properly. Therefore, I would like to first explain "Why use Python?" And "Why use Microsoft Azure?".

Why use Python?

I think there are three main reasons to use Python.

1. Many active developers

In terms of the number of active developers in 2019, it recorded 8.2M in the world, surpassing Java to become the second place. It is probably the most popular language in the field of Machine Learning (ML).

Choosing a language that many people are familiar with is one of the important factors to consider when developing a team. Another feature is that there are many documents (technical documents) because there are many Python users.

[Reference] Machine learning makes Python a programming language that surpasses Java and has the second largest number of developers https://ited.tech/artificial-intelligence/machine-learning-ai-made-python-second-most-used-language-over-java/

2. Abundance of machine learning libraries

A library is a useful function or function that has already been put together. In addition to the standard library, Python is very rich in libraries because machine learning researchers and developers publish external libraries as Open Source Software (OSS). I am. The abundance of libraries leads to a reduction in man-hours, which is a great advantage in both research and service development.

[Reference] 7 recommended Python libraries that are strong in machine learning and artificial intelligence development https://proengineer.internous.co.jp/content/columnfeature/13907

3. Small amount of code

Finally, the amount of code is small. There are trade-offs (slow execution speed, etc.) in this regard, but it leads to the advantages of ease of learning and reduction of man-hours for beginners.

[Reference] Compare the difference in description method with Python / C language https://algorithm.joho.info/programming/python/c-language-kijutsu-hikaku-chigai/

Why use Microsoft Azure?

I think there are three main reasons to use Microsoft Azure.

1. Superiority in CAPEX

CAPEX is an abbreviation for Capital Expenditure, which means capital expenditure. Usually, when you build a computer server in the laboratory or build an on-premises server to operate software services, the capital expenditure will be large.

If you purchase a high-priced machine at the timing required for research, various procedures such as phase estimation, examination, and depreciation may be involved. However, in the case of the cloud, CAPEX is not required and it can be executed immediately in a high-spec environment.

[Reference] Definition of CAPEX and OPEX in corporate activities https://www.clouderp.jp/blog/what-is-capex-opex

2. Storage elasticity

For software services that use machine learning, the elasticity of storage is also considered to be one of the major merits. If the design is such that the amount of learning data increases, adding it to on-premises requires labor and money in terms of maintenance and operation.

However, with the cloud, you can add storage at any time in just a few minutes.

[Reference] Basic knowledge of cloud storage https://2sq7d632aduy7flhh6iaxnby-wpengine.netdna-ssl.com/jp/wp-content/uploads/sites/2/2017/08/CDN-Broc-WP-vol01-r2-web.pdf

3. High compliance

The above two were the merits of using the cloud. So what are the advantages over other cloud services such as AWS (Amazon Web Service) and GCP (Google Cloud Platform)?

One of the notable merits is "high cloud compliance". Cloud Compliance is a regulatory standard for cloud use in cloud industry guidelines and domestic and international laws. One of the features of Azure is that it meets more than 90 standards regarding management and protection policies for personal information and compliance with laws and regulations. Therefore, we recommend using Azure for research and services that handle sensitive information such as personal information, biometric information, and medical information.

Also, as an aside, Microsoft services such as Microsoft 365 (Office 365), Dynamics 365, Github, VS Code, etc. High convenience (so-called vertical integration model) due to the cooperation of.

[Reference] Azure compliance https://azure.microsoft.com/ja-jp/overview/trusted-cloud/compliance/

Overall view of the service

First, as a whole, we classified the services as follows.

A. Services for developers B. Data infrastructure services C. Distributed processing analysis service D. Execution environment service E. Service utilizing AI / ML F. Deploy (public) service

And each service name is summarized as the following "preserved version summary image (figure)". abstract.png The more specific features and knowledge to be held are summarized below.

A. Services for developers

Introducing useful services for people (developers and data scientists) who actually develop models for machine learning.

Service name Overview
1. Azure Machine Learning A service that is the basis for executing machine learning on the cloud.
2. Azure Machine Learning Studio A web portal service for data scientist developers who can build models with GUI.
3. Azure Notebooks AservicethatallowsyoutoquicklyexecutecodesuchasPythononthecloud(*2021)/01/Scheduledtoendat21).

A-1. Azure Machine Learning --A service that is the basis for executing machine learning on the cloud. --Although there are still issues with GUI in fields such as deep learning (DNN) and computer vision, basic models such as regression and clustering can be implemented with GUI. Moreover, it is possible to produce graphs that compare the prediction accuracy of each model. -Of course, it can be implemented in Python (Tensorflow etc. can be imported and coded in Jupyter Notebook etc.). -Can be installed in Azure DevOps and Power BI.

Azure Machine Learning-Official Site https://azure.microsoft.com/ja-jp/services/machine-learning/

A-2. Azure Machine Learning Studio --A web portal service for data scientist developers who can build models with GUI.

Azure Machine Learning Studio-Official Site https://docs.microsoft.com/ja-jp/azure/machine-learning/overview-what-is-machine-learning-studio

A-3. Azure Notebooks --A service that allows you to quickly execute code such as Python on the cloud. --The merit is that there is no need to build an environment. --Free service. --There is a limit of 4 GB of memory and 1 GB of data (as of December 2020). --Public preview (like a trial period) ends on January 21, 2021.

Azure Notebooks-Official Site https://notebooks.azure.com/

[For python beginners] I tried writing python on Microsoft Azure Notebooks. https://qiita.com/Catetin0310/items/2f07e10f056e88e25f5b

How to use GPU in Jupyter Notebook? https://azureai.devpost.com/forum_topics/33099-how-to-use-gpu-in-jupyter-notebook

B. Data infrastructure services

Introducing data infrastructure services that are indispensable for building machine learning services.

Service name Overview
1. Azure Synapse Analytics(Azure SQL Data Warehouse) Data integration/Warehouse/Integrated analytics services such as data analytics.
2. Azure Data Lake A data lake service for storing large amounts of raw data.
3. Azure Data Factory Serverless data integration service.
4. Azure Open Datasets A service that allows you to access open data that is open to the public.

B-1. Azure SQL Data Warehouse --Integrated analysis services such as data integration / warehouse / data analysis. --An evolved version of Azure SQL Data Warehouse. --Choose from serverless and dedicated resources. --It is possible to manage / analyze multiple databases at once. --Can be executed for terabyte / petabyte data. --Service equivalent to AWS Redshift and GCP BigQuery. --Automatic scaling according to the load is also possible. --Data is stored in Premium Storage (using SSD), so it is expensive but fast processing.

Azure Synapse Analytics-Official Site https://azure.microsoft.com/ja-jp/services/synapse-analytics/

Azure SQL Data Warehouse Changed to Azure Synapse Analytics-Official Site https://azure.microsoft.com/ja-jp/blog/azure-sql-data-warehouse-is-now-azure-synapse-analytics/

Explanation of Azure SQL Data Warehouse [Introduction from the series Azure service Ichikara] https://azure-recipe.kc-cloud.jp/2017/12/sql_data_warehouse_2017adcal/

B-2. Azure Data Lake --A data lake service for storing large amounts of raw data. --Petabyte-class files / Can store and analyze billions of objects.

Azure Data Lake-Official Site https://azure.microsoft.com/ja-jp/solutions/data-lake/

B-3. Azure Data Factory --Serverless data integration service. --Can be linked with Azure Synapse Analytics.

Azure Data Factory-Official Site https://azure.microsoft.com/ja-jp/services/data-factory/

B-4. Azure Open Datasets --A service that allows you to access open data that is open to the public. --Access to weather / satellite images / socioeconomic data / public holidays.

Azure Open Datasets-Official Site https://azure.microsoft.com/ja-jp/services/open-datasets/

C. Distributed processing analysis service

Introducing a convenient service for analyzing the collected data.

Service name Overview
1. Azure HDInsight A service that provides a distributed processing framework for large-scale data.
2. Azure Databricks An Apache Spark-based analytics service that is simpler to use than Azure HDInsight.

C-1. Azure HDInsight --A service that provides a distributed processing framework for large-scale data. --Supports frameworks such as Apache Hadoop, Spark, Kafka.

Azure HDInsight-Official Site https://azure.microsoft.com/ja-jp/services/hdinsight/

C-2. Azure Databricks --Apache Spark based analytics service. --Simpler to use than Azure HDInsight. --Collaboration function / automatic scaling function etc. are also available.

Azure Databricks-Official Site https://azure.microsoft.com/ja-jp/services/databricks/

D. Execution environment service

Since there are several execution environment options such as virtual machines (VMs), we will introduce the services of the execution environment.

Service name Overview
1. Azure Virtual Machine The most common virtual machine service.
2. Azure DevTest Labs A service for quickly building virtual machines for developers.
3. Azure Lab Services A virtual machine service used for hackathons and education.

D-1. Azure Virtual Machine --The most common virtual machine service. --Costs vary depending on CPU and GPU specifications.

Azure Virtual Machine Series-Official Site https://azure.microsoft.com/ja-jp/pricing/details/virtual-machines/series/

D-2. Azure DevTest Labs --A service for quickly building virtual machines (VMs) and PaaS resources for developers that are separate from the Production environment. --Since the specifications and number of VMs are also limited, it is possible to omit the approval process and build. --Since it will be temporarily lent to the developer, the instance will be automatically deleted after the rent is completed. --Azure DevTest Labs corresponds to the "test environment", and the "production environment" uses a virtual machine scale set with high load flexibility.

Azure DevTest Labs-Official Site https://docs.microsoft.com/ja-jp/azure/devtest-labs/devtest-lab-overview

D-3. Azure Lab Services --A virtual machine service used for hackathons and education. --Students and participants can access the lab created by the instructor and organizer. ――Scaling according to the participants is also possible.

Azure Lab Services-Official Site https://azure.microsoft.com/ja-jp/services/lab-services/

E. Service utilizing AI / ML

Introducing services that utilize Artificial Intelligence (AI) / Machine Learning (ML).

Service name Overview
1. Azure Cognitive Services letter/image/A service that recognizes voice and the like.
2. Azure Bot Service A bot service that answers questions on behalf of humans.
3. Azure Cognitive Search(Azure Search) A cloud search service for mobile and web developers utilizing AI.

E-1. Azure Cognitive Services --A service that recognizes characters / images / sounds. --Easy to install because a learned environment is provided.

Azure Cognitive Services-Official Site https://azure.microsoft.com/ja-jp/services/cognitive-services/

E-2. Azure Bot Service --A bot service that answers questions on behalf of humans. --Text and voice support is possible. --Can be introduced into services such as customer support.

Azure Bot Service-Official Site https://azure.microsoft.com/ja-jp/services/bot-service/

E-3. Azure Cognitive Search(Azure Search) --Cloud search service for mobile and web developers using AI. --Available for natural language stack / visual / language / audio.

Azure Cognitive Search-Official Site https://azure.microsoft.com/ja-jp/services/search/

F. Deploy (public) service

Introducing the services you should know when publishing a machine learning model built with Python or GUI as a web service or API (Application Programming Interface). Here, we will also consider web frameworks using Python called Flask and Django, and web services built by combining Python with other server-side languages ​​such as NodeJS.

You may want to refer to the article "Choose an Azure compute service for your application" (https://docs.microsoft.com/ja-jp/azure/architecture/guide/technology-choices/compute-decision-tree) and choose a service to deploy depending on how much you want to customize your machine.

Service name Overview
1. Azure Virtual Machine The most common virtual machine service.
2. Azure App Services A service that allows you to build and deploy web applications and APIs.
3. Azure Functions Event-driven serverless computing platform service.

F-1. Azure Virtual Machine --See [D-1. Azure Virtual Machine](#D-1. Azure Virtual Machine).

F-2. Azure App Services --A service that allows you to build and deploy web applications and APIs. --Although the degree of freedom is lower than that of VM, the minimum necessary to operate the service can be implemented with this service.

Azure App Services-Official Site https://azure.microsoft.com/ja-jp/services/app-service/

F-3. Azure Functions --Event-driven serverless computing platform service. --You can easily deploy the function you want to execute. Can be implemented in various languages ​​(.NET, node.js, Java, etc.).

Azure Functions-Official Site https://azure.microsoft.com/ja-jp/services/functions/

in conclusion

How was that?

We hope this will be an opportunity for people with or without experience in data science or software development using Azure to increase withdrawals and lead to a systematic understanding.

Microsoft is a fast-growing company in Azure / Teams, and I'm looking forward to future developments. If you want to know more about Microsoft hardware and software, please refer to the following articles.

[Reference] 2020 Microsoft product summary! https://note.com/umi_mori/n/n0abe3126e989

Then, please read to the end, thank you. If you find it helpful, I would be grateful if you could follow LGTM / Stock / Twitter!

Next time, it will be an article by @ m-naoki!

References

-Passing Measures Microsoft Certification AZ-900: Microsoft Azure Fundamentals Text & Questions

Recommended Posts

I tried to summarize useful knowledge when developing / operating machine learning services [Python x Azure]
I tried to build an environment for machine learning with Python (Mac OS X)
[Machine learning] I tried to summarize the theory of Adaboost
I installed Python 3.5.1 to study machine learning
Python3 standard input I tried to summarize
[Introduction to Docker] I tried to summarize various Docker knowledge obtained by studying (Windows / Python)
I tried to move machine learning (ObjectDetection) with TouchDesigner
When I tried to introduce python3 to atom, I got stuck
I tried to summarize how to use matplotlib of python
I tried to compress the image using machine learning
I tried to summarize the string operations of Python
I tried to make a real-time sound source separation mock with Python machine learning
I tried machine learning to convert sentences into XX style
Mayungo's Python Learning Episode 3: I tried to print numbers with print
I tried to summarize SparseMatrix
[Qiita API] [Statistics • Machine learning] I tried to summarize and analyze the articles posted so far.
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 1
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 2
Mayungo's Python Learning Episode 4: I tried to see what happens when numbers are treated as letters
I tried to summarize everyone's remarks on slack with wordcloud (Python)
[Machine learning] I tried to do something like passing an image
I tried to touch Python (installation)
I tried machine learning with liblinear
[Python] I tried to summarize the set type (set) in an easy-to-understand manner.
When I tried to run Python, it was skipped to the Microsoft Store
Mayungo's Python Learning Episode 6: I tried to convert a character string to a number
I tried to imitate Python x Kivy de Kinoppy (Kinokuniya bookstore app)
[ML-Aents] I tried machine learning using Unity and Python TensorFlow (v0.11β compatible)
Mayungo's Python Learning Episode 2: I tried to put out characters with variables
[AWS] [GCP] I tried to make cloud services easy to use with Python
I tried to classify guitar chords in real time using machine learning
(Machine learning) I tried to understand Bayesian linear regression carefully with implementation.
A beginner of machine learning tried to predict Arima Kinen with python
I tried to visualize the model with the low-code machine learning library "PyCaret"
I started machine learning with Python (I also started posting to Qiita) Data preparation
I refactored "I tried to make Othello AI when programming beginners studied python"
I tried to summarize the umask command
I tried to implement permutation in Python
Wrangle x Python book I tried it [2]
I tried to implement PLSA in Python 2
I tried to summarize the graphical modeling.
I tried to implement ADALINE in Python
[Python] When an amateur starts machine learning
Wrangle x Python book I tried it [1]
Mayungo's Python Learning Episode 8: I tried input
An introduction to Python for machine learning
[Python] I tried to calculate TF-IDF steadily
I tried to touch Python (basic syntax)
I tried to summarize Ansible modules-Linux edition
I tried to organize the evaluation indexes used in machine learning (regression model)
Mayungo's Python Learning Episode 5: I tried to do four arithmetic operations with numbers
[Azure] I tried to create a Linux virtual machine in Azure of Microsoft Learn
I tried to implement various methods for machine learning (prediction model) using scikit-learn.
I tried to process and transform the image and expand the data for machine learning
[Python] Easy introduction to machine learning with python (SVM)
Upgrade the Azure Machine Learning SDK for Python
I tried to get CloudWatch data with Python
EV3 x Python Machine Learning Part 2 Linear Regression
I tried to output LLVM IR with Python
[Note] Python, when starting machine learning / deep learning [Links]
I tried to implement TOPIC MODEL in Python