Introduction to Apache Beam (1) ~ Reading and writing text ~

Overall purpose

Create a simple Apache Beam program to understand how it works

Purpose of this time

Create a program that reads a local text file and writes it as is

Main story

environment

IntelliJ


IntelliJ IDEA 2017.3.3 (Ultimate Edition)
Build #IU-173.4301.25, built on January 16, 2018
Licensed to kaito iwatsuki
Subscription is active until January 24, 2019
For educational use only.
JRE: 1.8.0_152-release-1024-b11 x86_64
JVM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Mac OS X 10.12.6

Maven : 3.5.2

procedure

Preparation

https://gyazo.com/d68a0b28e4f5a49ddcf77b8ae350ddb3

https://gyazo.com/9f3c75bb7289d485d09bee88b662663b

SimpleBeam.java


import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.values.PCollection;

public class SimpleBeam {
    public static void main(String[] args){
        PipelineOptions options = PipelineOptionsFactory.create();

        Pipeline p = Pipeline.create(options);
        //Read text
        PCollection<String> textData = p.apply(TextIO.read().from("Sample.txt"));
        //Text writing
        textData.apply(TextIO.write().to("wordcounts"));
        //Pipeline run
        p.run().waitUntilFinish();
    }
}

If the library whose dependency is not resolved, click command + and add it as shown in the image.

https://gyazo.com/62d3d70f83834da1d5cd610a056d2333

https://gyazo.com/323bde1aca3e1cff001c792f0f80e767

https://gyazo.com/c31e09c23800d61222d87624844c8be3

Run

https://gyazo.com/c8896bc73119d76b1d95e2b75e510819

https://gyazo.com/cb53ea8efadaf6e488f9fe5acc2cdf52

https://gyazo.com/3c554c5e5c91074019bf8efc3dbab101

https://gyazo.com/666109229175f8ffa1ca57640a0a1e88

The output is as follows, and if the `` wordcounts-. * File is created in the ~ / beamSample` directory, it succeeds.

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.827 s
[INFO] Finished at: 2018-02-27T02:46:10+09:00
[INFO] Final Memory: 21M/373M
[INFO] ------------------------------------------------------------------------

Process finished with exit code 0

https://gyazo.com/01ec441970c699e8a2d440f1b1c9e667

In other words

This time, there is nothing that seems to stumble, but since I have little understanding of the contents of IntelliJ, I was impatient with an unknown error several times. However, most of the causes were that the dependencies could not be resolved, so I managed to do ʻAdd_Maven`.

from next time

This time, I just moved it, so from the next time onward, I would like to configure a simple Pipeline that also serves as a review of the idea of MapReduce.

Recommended Posts

Introduction to Apache Beam (1) ~ Reading and writing text ~
I tried to chew C # (reading and writing files)
Implement writing and reading to Property List (.plist) in Swift
[Java] Reading and writing files with OpenCSV
Introduction to EHRbase 1-Overview and Environmental Improvement
[Introduction to Java] Variable declarations and types
Reading and writing gzip files in Java
Scraping and writing specific elements to a file
[Raspberry Pi] Try to link Apache2 and Tomcat
[Review] Reading and writing files with java (JDK6)
Introduction to Ruby 2
N things to keep in mind when reading "Introduction to Spring" and "Introduction to Spring" in the Reiwa era
Introduction to SWING
Apache and tomcat
Introduction to web3j
Introduction to Micronaut 1 ~ Introduction ~
[Java] Introduction to Java
Introduction to migration
Introduction to java
Introduction to Doma
A brief introduction to terasoluna5, see the text below