A record of a brief touch on GCP's Data Flow

Intro I've touched Google Cloud Dataflow, so I'll keep a record of what it was like as a reminder.

What is Google Cloud Dataflow?

Simply put, it is a GCP service that handles the operation and management of streaming data and so on. This time, I will place the data received by PubSub in CloudStrage via DataFlow. It is assumed that a large number of things such as logs and tracking data are posted, and that they are retained or used for later analysis.

1.Pub/Subにトピックを作成

Pub / Sub is another straightforward name, but it's also a GCP service. So-called messaging or queue. It will be the starting point of this data, a trigger in a sense. For the time being, just name it from the GUI console as usual.

2. Prepare a bucket for Cloud Storage

Prepare a place to be the end point of this time. Create a new bucket or prepare an existing folder. Also prepare a folder to place temporary files (described later).

3. Create DataFlow from template

Once you've prepared so far, all you have to do is click on the screen. Because DataFlow has templates for frequently used use cases, You can make something that works to some extent just by selecting it appropriately and making the necessary settings.

When you reach the creation screen from the "Create job from template" link, select the job name and region. Then select a template. This time, select "Pub / Sub to Text Files on Cloud Strage" (as is!). As a point in setting other essential parameters,

スクリーンショット 2020-09-23 13.30.41.png

4. Try publishing to Topic

With the above, all the settings are completed. After running the job and publishing the message from PubSub, after a while, the file is created in Cloud Storage. There is a published message inside! By the way, it seems that the default message accumulated in 5 minutes is sent to Cloud Storage as one file after each line break.

Remarks / impressions

Recommended Posts

A record of a brief touch on GCP's Data Flow
A series of flow of table creation → record creation, deletion → table deletion in Ruby on Rails
A brief explanation of commitAllowingStateLoss
A brief description of JAVA dependencies
[Ruby on Rails] Introduction of initial data
A brief summary of DI and DI containers
Rails: A brief summary of find, find_by, where
Build a test flow on CircleCI using Jib
(Ruby on Rails6) Creating data in a table