Why you should consider adopting Go CDK even if you are a single cloud vendor

What is Go CDK?

Go CDK is an abbreviation of The Go Cloud Development Kit, which is a project to handle services with almost the same functions provided by major cloud vendors with a unified API (formerly known as Go Cloud).

For example, the process of saving / retrieving an object in a cloud storage service can be written as follows by using Go CDK [^ 1].

[^ 1]: The gocloud.dev used in the sample code in this article is v0.20.0.

package main

import (
	"context"
	"fmt"
	"log"

	"gocloud.dev/blob"
	_ "gocloud.dev/blob/s3blob"
)

func main() {
	ctx := context.Background()

	bucket, err := blob.OpenBucket(ctx, "s3://bucket")
	if err != nil {
		log.Fatal(err)
	}
	defer bucket.Close()

	//Save
	if err := bucket.WriteAll(ctx, "sample.txt", []byte("Hello, world!"), nil); err != nil {
		log.Fatal(err)
	}

	//Get
	data, err := bucket.ReadAll(ctx, "sample.txt")
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println(string(data))
}
package main

import (
	"context"
	"fmt"
	"log"

	"gocloud.dev/blob"
	_ "gocloud.dev/blob/gcsblob"
)

func main() {
	ctx := context.Background()

	bucket, err := blob.OpenBucket(ctx, "gs://bucket")
	if err != nil {
		log.Fatal(err)
	}
	defer bucket.Close()

	//Save
	if err := bucket.WriteAll(ctx, "sample.txt", []byte("Hello, world!"), nil); err != nil {
		log.Fatal(err)
	}

	//Get
	data, err := bucket.ReadAll(ctx, "sample.txt")
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println(string(data))
}
package main

import (
	"context"
	"fmt"
	"log"

	"gocloud.dev/blob"
	_ "gocloud.dev/blob/azureblob"
)

func main() {
	ctx := context.Background()

	bucket, err := blob.OpenBucket(ctx, "azblob://bucket")
	if err != nil {
		log.Fatal(err)
	}
	defer bucket.Close()

	//Save
	if err := bucket.WriteAll(ctx, "sample.txt", []byte("Hello, world!"), nil); err != nil {
		log.Fatal(err)
	}

	//Get
	data, err := bucket.ReadAll(ctx, "sample.txt")
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println(string(data))
}

The code differences when using different cloud vendors are in the import part of the driver and blob.OpenBucket (). Only the scheme for the URL you are giving. It is wonderful!

In this way, Go CDK makes it easy to implement multi-cloud applications and highly cloud-portable applications.

If you would like to know more about Go CDK, please refer to the official information.

It is a very convenient Go CDK, but the project status as of October 2020 seems to be "API is alpha but production-ready" [^ 2]. Please be at your own risk when introducing it.

Go CDK benefits beyond multi-cloud portability

This is the main subject.

I'm sure there are people who think, "I only use AWS! Vendor lock-in is good!" In this article, I would like to introduce the advantages of using Go CDK even for such people, using "S3 object operations (save / retrieve)" as an example.

Easy to understand and handle API

The Go CDK API is designed to be intuitive, easy to understand, and easy to handle.

Reading and writing objects to cloud storage with Go CDK is available at blob.Bucket NewReader () and NewWriter () Blob.Reader ([[email protected]/blob#Bucket.NewWriter) ʻO.Reader](implemented https://pkg.go.dev/io#Reader)) and [blob.Writer](https://pkg.go.dev/[email protected]) Use / blob # Writer) (implementing [ʻio.Writer). It is very intuitive that you can get (read) an object with blob.Reader (ʻio.Reader) and save (write) it with ʻio.Writer (ʻio.Writer`). This allows you to work with objects in the cloud as if you were working with local files.

Let's take a look at how it will be easier to understand than using the AWS SDK, with concrete examples.

When saving an object in S3

For the AWS SDK, [ʻUpload () in [s3manager.Uploader](https://pkg.go.dev/github.com/aws/aws-sdk-go/service/s3/s3manager#Uploader) ](Https://pkg.go.dev/github.com/aws/aws-sdk-go/service/s3/s3manager#Uploader.Upload) will be used. Pass the contents of the object to be uploaded to the method as ʻio.Reader. When uploading a local file, it is convenient to pass ʻos.File` as it is, but the trouble is that the data in memory is in some form. If you want to encode with and save as it is.

For example, the process of JSON-encoding and S3 as it is is as follows in the AWS SDK.

package main

import (
	"encoding/json"
	"io"
	"log"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/s3/s3manager"
)

func main() {
	sess, err := session.NewSession(&aws.Config{
		Region: aws.String("ap-northeast-1"),
	})
	if err != nil {
		log.Fatal(err)
	}

	uploader := s3manager.NewUploader(sess)

	data := struct {
		Key1 string
		Key2 string
	}{
		Key1: "value1",
		Key2: "value2",
	}

	pr, pw := io.Pipe()

	go func() {
		err := json.NewEncoder(pw).Encode(data)
		pw.CloseWithError(err)
	}()

	in := &s3manager.UploadInput{
		Bucket: aws.String("bucket"),
		Key:    aws.String("sample.json"),
		Body:   pr,
	}
	if _, err := uploader.Upload(in); err != nil {
		log.Fatal(err)
	}
}

[ʻIo.Pipe () ](https://pkg.go.dev/io) to connect ʻio.Writer for encoding JSON and ʻio.Readerpassed tos3manager.UploadInput` #Pipe) must be used.

With Go CDK, writing is done with blob.Writer (ʻio.Writer), so [json.NewEncoder ()`](https://pkg.go.dev/encoding/json#NewEncoder) Just pass it to.

package main

import (
	"context"
	"encoding/json"
	"log"

	"gocloud.dev/blob"
	_ "gocloud.dev/blob/s3blob"
)

func main() {
	ctx := context.Background()

	bucket, err := blob.OpenBucket(ctx, "s3://bucket")
	if err != nil {
		log.Fatal(err)
	}
	defer bucket.Close()

	data := struct {
		Key1 string
		Key2 string
	}{
		Key1: "value1",
		Key2: "value2",
	}

	w, err := bucket.NewWriter(ctx, "sample.json", nil)
	if err != nil {
		log.Fatal(err)
	}
	defer w.Close()

	if err := json.NewEncoder(w).Encode(data); err != nil {
		log.Fatal(err)
	}
}

Of course, you can simply write when uploading a local file. Just use ʻio.Copy` as if you were copying from file to file.

package main

import (
	"context"
	"io"
	"log"
	"os"

	"gocloud.dev/blob"
	_ "gocloud.dev/blob/s3blob"
)

func main() {
	ctx := context.Background()

	bucket, err := blob.OpenBucket(ctx, "s3://bucket")
	if err != nil {
		log.Fatal(err)
	}
	defer bucket.Close()

	file, err := os.Open("sample.txt")
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()

	w, err := bucket.NewWriter(ctx, "sample.txt", nil)
	if err != nil {
		log.Fatal(err)
	}
	defer w.Close()

	if _, err := io.Copy(w, file); err != nil {
		log.Fatal(err)
	}
}

By the way, the Writer of s3blob is implemented by wrapping s3manager.Uploader, so you can benefit from the parallel upload function of s3manager.Uploader.

When retrieving an object from S3

Consider the case of getting JSON from S3 and decoding it.

For the AWS SDK, use s3.GetObject (). S3manager.Downloader, which is paired with s3manager.Uploader, is the output destination. Note that you need to implement ʻio.WriterAt`, so you can't use it in this case.

package main

import (
	"encoding/json"
	"fmt"
	"log"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/s3"
)

func main() {
	sess, err := session.NewSession(&aws.Config{
		Region: aws.String("ap-northeast-1"),
	})
	if err != nil {
		log.Fatal(err)
	}

	svc := s3.New(sess)

	data := struct {
		Key1 string
		Key2 string
	}{}

	in := &s3.GetObjectInput{
		Bucket: aws.String("bucket"),
		Key:    aws.String("sample.json"),
	}
	out, err := svc.GetObject(in)
	if err != nil {
		log.Fatal(err)
	}
	defer out.Body.Close()

	if err := json.NewDecoder(out.Body).Decode(&data); err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%+v\n", data)
}

For Go CDK, just write it in the opposite direction of uploading.

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"

	"gocloud.dev/blob"
	_ "gocloud.dev/blob/s3blob"
)

func main() {
	ctx := context.Background()

	bucket, err := blob.OpenBucket(ctx, "s3://bucket")
	if err != nil {
		log.Fatal(err)
	}
	defer bucket.Close()

	r, err := bucket.NewReader(ctx, "sample.json", nil)
	if err != nil {
		log.Fatal(err)
	}
	defer r.Close()

	data := struct {
		Key1 string
		Key2 string
	}{}

	if err := json.NewDecoder(r).Decode(&data); err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%+v\n", data)
}

Use s3manager.Downloader to write the retrieved object to a local file can do.

package main

import (
	"log"
	"os"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/s3"
	"github.com/aws/aws-sdk-go/service/s3/s3manager"
)

func main() {
	sess, err := session.NewSession(&aws.Config{
		Region: aws.String("ap-northeast-1"),
	})
	if err != nil {
		log.Fatal(err)
	}

	downloader := s3manager.NewDownloader(sess)

	file, err := os.Create("sample.txt")
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()

	in := &s3.GetObjectInput{
		Bucket: aws.String("bucket"),
		Key:    aws.String("sample.txt"),
	}
	if _, err := downloader.Download(file, in); err != nil {
		log.Fatal(err)
	}
}

In Go CDK, it is OK if you write it in the opposite way to the upload.

package main

import (
	"context"
	"io"
	"log"
	"os"

	"gocloud.dev/blob"
	_ "gocloud.dev/blob/s3blob"
)

func main() {
	ctx := context.Background()

	bucket, err := blob.OpenBucket(ctx, "s3://bucket")
	if err != nil {
		log.Fatal(err)
	}
	defer bucket.Close()

	file, err := os.Create("sample.txt")
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()

	r, err := bucket.NewReader(ctx, "sample.txt", nil)
	if err != nil {
		log.Fatal(err)
	}
	defer r.Close()

	if _, err := io.Copy(file, r); err != nil {
		log.Fatal(err)
	}
}

However, although this method is simple, it has some drawbacks. In the case of s3manager.Downloder, instead of requesting ʻio.WriterAt to the output destination, it has a parallel download function and has excellent performance, but in the case of Go CDK, parallel download cannot be performed as it is. If you want to download in parallel with Go CDK, you need to implement it yourself using [NewRangeReader ()`](https://pkg.go.dev/[email protected]/blob#Bucket.NewRangeReader). there is.

Example of parallel download implementation in Go CDK (it is long, so fold it)
package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"log"
	"os"
	"sync"

	"gocloud.dev/blob"
	_ "gocloud.dev/blob/s3blob"
)

const (
	downloadPartSize    = 1024 * 1024 * 5
	downloadConcurrency = 5
)

func main() {
	ctx := context.Background()

	bucket, err := blob.OpenBucket(ctx, "s3://bucket")
	if err != nil {
		log.Fatal(err)
	}
	defer bucket.Close()

	file, err := os.Create("sample.txt")
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()

	d := &downloader{
		ctx:         ctx,
		bucket:      bucket,
		key:         "sample.txt",
		partSize:    downloadPartSize,
		concurrency: downloadConcurrency,
		w:           file,
	}

	if err := d.download(); err != nil {
		log.Fatal(err)
	}
}

type downloader struct {
	ctx context.Context

	bucket *blob.Bucket
	key    string
	opts   *blob.ReaderOptions

	partSize    int64
	concurrency int

	w io.WriterAt

	wg     sync.WaitGroup
	sizeMu sync.RWMutex
	errMu  sync.RWMutex

	pos        int64
	totalBytes int64
	err        error

	partBodyMaxRetries int
}

func (d *downloader) download() error {
	d.getChunk()
	if err := d.getErr(); err != nil {
		return err
	}

	total := d.getTotalBytes()

	ch := make(chan chunk, d.concurrency)

	for i := 0; i < d.concurrency; i++ {
		d.wg.Add(1)
		go d.downloadPart(ch)
	}

	for d.getErr() == nil {
		if d.pos >= total {
			break
		}

		ch <- chunk{w: d.w, start: d.pos, size: d.partSize}
		d.pos += d.partSize
	}

	close(ch)
	d.wg.Wait()

	return d.getErr()
}

func (d *downloader) downloadPart(ch chan chunk) {
	defer d.wg.Done()
	for {
		c, ok := <-ch
		if !ok {
			break
		}
		if d.getErr() != nil {
			continue
		}

		if err := d.downloadChunk(c); err != nil {
			d.setErr(err)
		}
	}
}

func (d *downloader) getChunk() {
	if d.getErr() != nil {
		return
	}

	c := chunk{w: d.w, start: d.pos, size: d.partSize}
	d.pos += d.partSize

	if err := d.downloadChunk(c); err != nil {
		d.setErr(err)
	}
}

func (d *downloader) downloadChunk(c chunk) error {
	var err error
	for retry := 0; retry <= d.partBodyMaxRetries; retry++ {
		err := d.tryDownloadChunk(c)
		if err == nil {
			break
		}

		bodyErr := &errReadingBody{}
		if !errors.As(err, &bodyErr) {
			return err
		}

		c.cur = 0
	}
	return err
}

func (d *downloader) tryDownloadChunk(c chunk) error {
	r, err := d.bucket.NewRangeReader(d.ctx, d.key, c.start, c.size, d.opts)
	if err != nil {
		return err
	}
	defer r.Close()

	if _, err := io.Copy(&c, r); err != nil {
		return err
	}

	d.setTotalBytes(r.Size())

	return nil
}

func (d *downloader) getErr() error {
	d.errMu.RLock()
	defer d.errMu.RUnlock()

	return d.err
}

func (d *downloader) setErr(err error) {
	d.errMu.Lock()
	defer d.errMu.Unlock()

	d.err = err
}

func (d *downloader) getTotalBytes() int64 {
	d.sizeMu.RLock()
	defer d.sizeMu.RUnlock()

	return d.totalBytes
}

func (d *downloader) setTotalBytes(size int64) {
	d.sizeMu.Lock()
	defer d.sizeMu.Unlock()

	d.totalBytes = size
}

type chunk struct {
	w     io.WriterAt
	start int64
	size  int64
	cur   int64
}

func (c *chunk) Write(p []byte) (int, error) {
	if c.cur >= c.size {
		return 0, io.EOF
	}

	n, err := c.w.WriteAt(p, c.start+c.cur)
	c.cur += int64(n)

	return n, err
}

type errReadingBody struct {
	err error
}

func (e *errReadingBody) Error() string {
	return fmt.Sprintf("failed to read part body: %v", e.err)
}

func (e *errReadingBody) Unwrap() error {
	return e.err
}
  • Refer to the implementation of s3manager.Downloader

Easy local execution

Go CDK is being developed to provide a local implementation for all services. Therefore, you can easily replace the operation of the cloud service with the local implementation. For example, on a local server for development, it is convenient to replace all services with local implementations so that they can operate without accessing AWS or GCP.

For the gocloud.dev / blob package that handles cloud storage, [fileblob](https://pkg. An implementation is provided that reads and writes local files (go.dev/[email protected]/blob/fileblob).

The following is an example of switching the output destination of encoded JSON to S3 and local depending on the option.

package main

import (
	"context"
	"encoding/json"
	"flag"
	"log"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/session"
	"gocloud.dev/blob"
	"gocloud.dev/blob/fileblob"
	"gocloud.dev/blob/s3blob"
)

func main() {
	var local bool
	flag.BoolVar(&local, "local", false, "output to a local file")
	flag.Parse()

	ctx := context.Background()

	bucket, err := openBucket(ctx, local)
	if err != nil {
		log.Fatal(err)
	}
	defer bucket.Close()

	data := struct {
		Key1 string
		Key2 string
	}{
		Key1: "value1",
		Key2: "value2",
	}

	w, err := bucket.NewWriter(ctx, "sample.json", nil)
	if err != nil {
		log.Fatal(err)
	}
	defer w.Close()

	if err := json.NewEncoder(w).Encode(data); err != nil {
		log.Fatal(err)
	}
}

func openBucket(ctx context.Context, local bool) (*blob.Bucket, error) {
	if local {
		return openLocalBucket(ctx)
	}

	return openS3Bucket(ctx)
}

func openLocalBucket(ctx context.Context) (*blob.Bucket, error) {
	return fileblob.OpenBucket("output", nil)
}

func openS3Bucket(ctx context.Context) (*blob.Bucket, error) {
	sess, err := session.NewSession(&aws.Config{
		Region: aws.String("ap-northeast-1"),
	})
	if err != nil {
		return nil, err
	}

	return s3blob.OpenBucket(ctx, sess, "bucket", nil)
}

If you run it as is, sample.json will be saved in S3, but if you run it with the -local option, it will be saved in the local ʻoutput / sample.json. At this time, the property of the object is saved as ʻoutput / sample.json.attrs. This makes it possible to get the properties of the saved object without any problem.

Overwhelmingly improved testability

Code that calls APIs of external services like AWS always has a headache about how to make it a testable implementation, but with Go CDK you don't have to worry about that. Normally, you would abstract the external service as an interface to implement the mock, and replace it with the mock in the test ... but Go CDK already abstracts each service properly and provides its local implementation. Since it is done, you can just use it as it is.

For example, consider testing a structure that implements an interface for uploading encoded JSON to cloud storage, such as:

type JSONUploader interface {
	func Upload(ctx context.Context, key string, v interface{}) error

In the case of AWS SDK, interfaces of various service clients are provided, so testability is guaranteed by using them. For s3manager, the interface is provided in a package called s3manageriface.

type jsonUploader struct {
	bucketName string
	uploader   s3manageriface.UploaderAPI
}

func (u *jsonUploader) Upload(ctx context.Context, key string, v interface{}) error {
	pr, pw := io.Pipe()

	go func() {
		err := json.NewEncoder(pw).Encode(v)
		pw.CloseWithError(err)
	}()

	in := &s3manager.UploadInput{
		Bucket: aws.String(u.bucketName),
		Key:    aws.String(key),
		Body:   pr,
	}
	if _, err := u.uploader.UploadWithContext(ctx, in); err != nil {
		return err
	}

	return nil
}

With such an implementation, you can test without actually accessing S3 by putting an appropriate mock in jsonUploader.uploader. However, this mock implementation is not officially provided, so you will have to implement it yourself or find a suitable external package.

In the case of Go CDK, it becomes a structure with high testability just by implementing it as it is.

type jsonUploader struct {
	bucket *blob.Bucket
}

func (u *jsonUploader) Upload(ctx context.Context, key string, v interface{}) error {
	w, err := u.bucket.NewWriter(ctx, key, nil)
	if err != nil {
		return err
	}
	defer w.Close()

	if err := json.NewEncoder(w).Encode(v); err != nil {
		return err
	}

	return nil
}

For testing, it is useful to use the in-memory blob implementation memblob.

func TestUpload(t *testing.T) {
	bucket := memblob.OpenBucket(nil)
	uploader := &jsonUploader{bucket: bucket}

	ctx := context.Background()

	key := "test.json"

	type data struct {
		Key1 string
		Key2 string
	}

	in := &data{
		Key1: "value1",
		Key2: "value2",
	}

	if err := uploader.Upload(ctx, key, in); err != nil {
		t.Fatal(err)
	}

	r, err := bucket.NewReader(ctx, key, nil)
	if err != nil {
		t.Fatal(err)
	}

	out := &data{}
	if err := json.NewDecoder(r).Decode(out); err != nil {
		t.Fatal(err)
	}

	if !reflect.DeepEqual(in, out) {
		t.Error("unmatch")
	}
}

Summary

We introduced the benefits other than multi-cloud support and cloud portability by introducing Go CDK. Due to the nature of handling multiple cloud vendors in a unified manner, there are of course weaknesses such as not being able to use functions specific to a specific cloud vendor, so I think that the SDK of each cloud vendor will be used properly according to the requirements.

Go CDK itself is still developing, so I hope that it will have more functions in the future.

Recommended Posts

Why you should consider adopting Go CDK even if you are a single cloud vendor
If you want to create a Word Cloud.