Intro I've touched Google Cloud Dataflow, so I'll keep a record of what it was like as a reminder.
Simply put, it is a GCP service that handles the operation and management of streaming data and so on. This time, I will place the data received by PubSub in CloudStrage via DataFlow. It is assumed that a large number of things such as logs and tracking data are posted, and that they are retained or used for later analysis.
Pub / Sub is another straightforward name, but it's also a GCP service. So-called messaging or queue. It will be the starting point of this data, a trigger in a sense. For the time being, just name it from the GUI console as usual.
Prepare a place to be the end point of this time. Create a new bucket or prepare an existing folder. Also prepare a folder to place temporary files (described later).
Once you've prepared so far, all you have to do is click on the screen. Because DataFlow has templates for frequently used use cases, You can make something that works to some extent just by selecting it appropriately and making the necessary settings.
When you reach the creation screen from the "Create job from template" link, select the job name and region. Then select a template. This time, select "Pub / Sub to Text Files on Cloud Strage" (as is!). As a point in setting other essential parameters,
With the above, all the settings are completed. After running the job and publishing the message from PubSub, after a while, the file is created in Cloud Storage. There is a published message inside! By the way, it seems that the default message accumulated in 5 minutes is sent to Cloud Storage as one file after each line break.
Recommended Posts