Apache Beam 2.0.x with Google Cloud Dataflow starting with IntelliJ and Gradle

Apache Beam 2.0.x with Google Cloud Dataflow starting with IntelliJ and Gradle

Basically the documentation only wrote a quickstart in Maven, so make a note of how to start Apache Beam with Gradle and IntelliJ.

This time, I did not think about specifying the Option for Pipeline, but for the time being, it is a setting so that it can be operated in the local environment. There is a possibility that additional descriptions will be made in the future, such as specifying an option for Pipeline.

Method

1 Create New Project with intelliJ IDEA

1.png

2 Select Gradle and Java

2.png

3 Specify groupId and artifactId

3.png

groupId: project root package name artifactId: project name

4 Various settings

Set as follows

4.png

5 Set project name and project location

If you don't mind staying displayed, proceed

6 Change to the following build.gradle

group 'hoge'
version '1.0-SNAPSHOT'

apply plugin: 'java'

sourceCompatibility = 1.8

repositories {
    mavenCentral()
}

dependencies {
    compile group: 'com.google.cloud.dataflow', name: 'google-cloud-dataflow-java-sdk-all', version: '2.0.0'
    testCompile group: 'junit', name: 'junit', version: '4.11'
}

7 Wait for build

If you change build.gradle to the above and wait for a while, IntelliJ will build it for you, so you can use Apache Beam.

Maven repository

It can be pulled from the Maven repository below, like the build.gradle I posted. Maven Repository: com.google.cloud.dataflow

The site that I used as a reference

Gradle beginners start Gradle-Qiita

Maven Repository: com.google.cloud.dataflow

Recommended Posts

Apache Beam 2.0.x with Google Cloud Dataflow starting with IntelliJ and Gradle
Introduction to Apache Beam with Google Cloud Dataflow (over 2.0.x series) ~ Combine ~
Introduction to Apache Beam with Google Cloud Dataflow (over 2.0.x series) ~ Basic Group By Key ~
Introduction to Apache Beam with Cloud Dataflow (over 2.0.0 series) ~ Basic part ~ ParDo ~
Word Count with Apache Spark and python (Mac OS X)
Upload and delete files to Google Cloud Storages with django-storage
What is Google Cloud Dataflow?
Install Python 2.7.9 and Python 3.4.x with pip.
Apache Beam (Dataflow) Practical Introduction [Python]
Run XGBoost with Cloud Dataflow (Python)
Get tweets with Google Cloud Function and automatically save images to Google Photos