[PYTHON] Data engineers learn DevOps with a view to MLOps. ① Getting started

tl; dr A series of systematically mastering DevOps / MLOps practices for the future In my case, I thought it would be a good idea to start by connecting the front-end language Typescript (ts) with the back-end language Scala / Python (with Spark ecosystem).

Reason for judgment:

1)Front-end developer(js/ts user)Python is friendly
 ※js/It seems that one of the standard programming languages used by ts users is python ★ 1
2)High affinity between ts model definition and spark model definition
* Typescript interface definition and case class for Spark are especially similar.
3)Mutual calling between ts and spark is becoming easier (the magic of "jsii").
* For example, using jsii, test a nodejs app written in typescript in Scala./Now you can do it in Python.
4) terraform/ansible/docker/My challenge with hashi stack is the modern front end.

★ 1 Source: "Second programming language" used by JavaScript users, first place is Python

[1] Maturity of the Typescript ecosystem on nodejs.

Typescript is a typed js compatible language. Typescript may be a safer option for engineers who write their backends in a typed language (C # / Scala etc.), although it may have a higher threshold than js. It seems that know-how for running Typescript on nodejs, which is an indispensable option for web development, has also been accumulated.

In the following, I will take the Nest.js framework, which seems to be one of the recent solutions of Typescript on nodejs, as an example. Will continue to be.) Reference Nest.js is wonderful

[2] Between the back end and the front end.

Typescript model definitions with interfaces are compatible with C # and Scala. Let me give you an example. ↑ Nest.js Great article, [Creating a User Interface with Nest.js](https://qiita.com/kmatae/items/5aacc8375f71105ce0e4#user%E3%82%A4%E3%83%B3%E3%82 % BF% E3% 83% BC% E3% 83% 95% E3% 82% A7% E3% 83% BC% E3% 82% B9% E3% 81% AE% E4% BD% 9C% E6% 88% 90 The interface user.interface.ts of the User model created in) is as follows:

typescript:user.interface.ts


export interface IUser {
    id: string;
    name: string;
    kana: string;
    email: string;
    postcode: string;
    address: string;
    phone: string;
    password: string;
    admin: boolean;
    createdAt: Date;
    updatedAt: Date;
}

In contrast, the corresponding scala classes for Spark are, for example:

IUser.scala


case class IUser (
    id: String,
    name: String,
    kana: String,
    email: String,
    postcode: String,
    address: String,
    phone: String,
    password: String,
    admin: Boolean,
    createdAt:TimeStamp,
    updatedAt:TimeStamp
)

It is a level that can be done by mechanical replacement (I used scala here, but it is the same with Python (PySpark) with type annotation like Typscript). It is practically easy to use only basic types such as String / Int / Boolean / TimeStamp for properties that are persisted to Spark. If the front end side can write safely in the same way, the story is quick (nest.js seems to be safe and productive to write ... Do you actually write it yourself? Please or not). If you can generate json from node.js written in Typescript and pour it into spark (with Kafka etc. in between or semi-stream processing), the rest will be irrelevant. On the backend side, Spark is one of the standard solutions for describing data hubs between datastores (connecting csv, cassandra, hive, neo4j, machine learning infrastructure etc.).

[3] Typescripe on nodejs can be tested in Python / Scala via jsii.

The jsii made by AWS that I learned from the following article. Introducing the magical "jsii" that runs programs written in TypeScript with Python etc.

jsii means that TypeScript code can be called directly from the Python / JVM language (Scala) / C #. This means you can write Typescript test code in Python or Scala. In the system that has entered actual operation, the Typescript model will change depending on the circumstances on the front end side (service expansion, etc.). It's nice to be able to continuously test the transfer of data as the model changes (testing when bringing untyped data (eg CSV) into the spark world is quite cumbersome).

I will move my hand from the next time.

Recently, I have personally studied how to quickly create the back side (data infrastructure) of a system with a web front end that pursues performance. I'm not familiar with front-end technology, but I've been able to get a feel for it by touching Svelte / typescript / nodejs / firebase ... in order. In the future, I would like to try connecting with the backend spark side while moving my hands with nestjs / SSR / ts2elm .... I hope that github action, which can CI / CD targeting nodejs and Docker, will be useful.

Recommended Posts

Data engineers learn DevOps with a view to MLOps. ① Getting started
Getting Started with python3 # 1 Learn Basic Knowledge
AWS Step Functions to learn with a sample
A layman wants to get started with Python
Materials to read when getting started with Python
Getting Started with python3 # 2 Learn about types and variables
Getting started with Android!
Getting Started with Golang 2
Getting started with apache2
Getting Started with Golang 1
Getting Started with Python
Getting Started with Django 1
Getting Started with Optimization
Getting Started with Golang 3
Getting Started with Numpy
Getting started with Spark
Materials to read when getting started with Apache Beam
Getting Started with Python
Getting Started with Pydantic
Getting Started with Golang 4
Getting Started with Jython
Getting Started with Django 2
Getting Started with Drawing with matplotlib: Creating Diagrams from Data Files
[Linux] Copy data from Linux to Windows with a shell script
Ingenuity to handle data with Pandas in a memory-saving manner
I tried to get started with Hy ・ Define a class
Getting Started with Flask # 2: Displaying Data Frames in Style Sheets
Getting Started with Python Functions
Getting Started with Tkinter 2: Buttons
Getting Started with Go Assembly
Getting Started with PKI with Golang ―― 4
Getting Started with Python Django (1)
Learn new data with PaintsChainer
Getting Started with Python Django (4)
Getting Started with Python Django (3)
Getting Started with Python Django (6)
Getting Started with Django with PyCharm
Python3 | Getting Started with numpy
Getting Started with Python Django (5)
Getting Started with Poetry From installation to execution and version control
If you want to become a data scientist, start with Kaggle
[Stock price analysis] Learn pandas with Nikkei 225 (004: Change read data to Nikkei 225)
Building a Windows 7 environment for getting started with machine learning with Python
[Python] A memo that I tried to get started with asyncio
I wrote a script to get you started with AtCoder fast!
Here's a brief summary of how to get started with Django
Getting started on how to solve linear programming problems with PuLP
Node.js: How to kill offspring of a process started with child_process.fork ()
Getting Started with Python responder v2
Link to get started with python
Getting Started with Git (1) History Storage
Getting started with Sphinx. Generate docstring with Sphinx
Getting Started with Python Web Applications
How to deal with imbalanced data
Getting Started with Python for PHPer-Classes
How to deal with imbalanced data
Getting Started with Sparse Matrix with scipy.sparse
Getting Started with Julia for Pythonista
Getting Started with Python Basics of Python
How to get started with Scrapy
How to get started with Python