Use curl / jq library with Go

Introduction

It is about the environment construction from Go to get json with curl library and parse json with jq library. It is possible to do it with the standard library alone [^ lib], but by using a library of typical commands, I would like to reduce the new memorization and reduce the implementation differences between languages as much as possible.

Since each library is provided in C language, it is good to use it directly when using it in Go, but for convenience, we will use the wrapper library.

[^ lib]: Get json with "net / http", parse json with "encoding / json"

Environment

The environment is as follows.

The wrapper library is as follows, both are thin wrappers and do nothing extra.

MSYS2 environment construction

Install curl and jq libraries. I'm working on the MSYS2 terminal.

Installation of curl library and jq library

Use Alexpux / MINGW-packages for installation, and make each library with the makepkg command (download the source of each library, apply the patch for MSYS2, build). It can be installed easily with (it will do everything automatically).

[^ libcurl]: With libcurl installed by pacman, curl_global_init and curl_easy_init could not be executed due to an error when calling the realloc function.

Installation


#Update package
$ pacman -Syuu
#* If the following warning appears, restart the terminal and try again.
#warning: terminate MSYS2 without returning to shell and check for updates again
#warning: for example close your terminal window instead of calling exit
$ pacman -Syuu

#Install build related packages used by makepkg
$ pacman -S git base-devel mingw-w64-x86_64-toolchain

# MINGW-git clone packages
$ cd /tmp
$ git clone --depth=1 https://github.com/Alexpux/MINGW-packages.git

#jq build&Installation
$ cd /tmp/MINGW-packages/
$ cd `ls | grep jq`
# -Install dependent packages that are missing with the s option
$ makepkg -s
#Installation of local packages
$ pacman -U ./mingw-w64-x86_64-jq-1.5-3-any.pkg.tar.xz
#Click here for direct installation
#$ cp -r pkg/mingw-w64-x86_64-jq/mingw64/* /mingw64/

#build curl&Installation
$ cd /tmp/MINGW-packages/
$ cd `ls | grep curl`
$ makepkg -s
#If you get a message that the PGP key cannot be verified, import it because you don't have the public key.
# ==>Validate source file signature with gpg...
#     curl-7.52.1.tar.bz2 ...Failure(Unknown public key 5CC908FDB71E12C2) <-* Copy and paste this
# ==>error:The PGP key could not be verified!
$ gpg --recv-keys 5CC908FDB71E12C2
#Re-execute
$ makepkg -s
#Installation of local packages
$ pacman -U ./mingw-w64-x86_64-curl-7.52.1-1-any.pkg.tar.xz
#Click here for direct installation
#$ cp -r pkg/mingw-w64-x86_64-curl/mingw64/* /mingw64/

Operation check (C)

Compile the following source with gcc, execute it, and if "ok" is output, the environment construction of curl is successful. I will omit jq because it will be used in Go after this.

test_curl.c


//Check if curl's SSL authentication works
#include <stdio.h>
#include <curl/curl.h>

size_t noop_function(char *buffer, size_t size, size_t nitems, void *instream) {
	return 0;
}

int main() {
	curl_global_init(CURL_GLOBAL_DEFAULT); // NOTE : libcurl-If you use devel, you can go here

	CURL *curl = curl_easy_init();
	if (!curl) return 1;

	// qiita.SSL connection to com
	curl_easy_setopt(curl, CURLOPT_URL, "https://qiita.com");
	curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0);
	//By default, the response is output to the standard output, which is annoying, so squeeze it.
	curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION , noop_function);
	curl_easy_perform(curl);

	//Get HTTP Response
	long http_code = 0;
	curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &http_code);
	if (http_code == 200) printf("ok");

	curl_easy_cleanup(curl);
	return 0;
}

compile&Run


$ gcc test_curl.c -I/mingw64/include -L/mingw64/lib -lcurl
$ ./a
ok

Go environment construction

This is done from a command prompt with a path to gcc (C: \ path \ to \ msys64 \ mingw \ bin \ gcc) on Go and MSYS2. ** * Please match the path to MSYS2 according to your environment. ** **

Install the wrapper library


> @Make rem curl, jq library visible from Go
> set CGO_CFLAGS=-I C:\path\to\msys64\mingw64\include
> set CGO_LDFLAGS=-L C:\path\to\msys64\mingw64\lib
> @Install rem curl, jq wrapper library
> go get -u github.com/andelf/go-curl
> go get -u github.com/mgood/go-jq
> @rem go get -u github.com/wordijp/go-jq

At this point, the environment construction is complete.

Try using the curl / jq library in Go

** (2017/2/11) Added about the convenience of jq **

Get the post list of qiita with curl, narrow down the list of poster IDs and titles with jq and display it. By the way, you can check the list of posts from All Posts on Qiita Home.

すべての投稿.png

The JSON of the post list that can be obtained with Qiita API v2 has the following format.

Qiita_API_V2_items.json


[
  {
    "rendered_body": "Text(Omitted because it is long)",
    "coediting": false,
    "created_at": "2017-02-09T20:29:56+09:00",
    "group": null,
    "id": "acb70ae51c334aa1ee52",
    "private": false,
    "tags": [
      {
        "name": "C",
        "versions": []
      },
      {
        "name": "Go",
        "versions": []
      },
      {
        "name": "curl",
        "versions": []
      },
      {
        "name": "jq",
        "versions": []
      },
      {
        "name": "msys2",
        "versions": []
      }
    ],
    "title": "Curl with Go/Use jq library",
    "updated_at": "2017-02-09T21:24:05+09:00",
    "url": "http://qiita.com/wordijp/items/acb70ae51c334aa1ee52",
    "user": {
      "description": "C++I like(C++I'm not saying I'll post an article)",
      "facebook_id": "",
      "followees_count": 7,
      "followers_count": 9,
      "github_login_name": "wordijp",
      "id": "wordijp",
      "items_count": 33,
      "linkedin_id": "",
      "location": "",
      "name": "",
      "organization": "",
      "permanent_id": 51728,
      "profile_image_url": "https://qiita-image-store.s3.amazonaws.com/0/51728/profile-images/1473692528",
      "twitter_screen_name": "wordijp",
      "website_url": ""
    }
  },
  {
2nd article
  }
]

The jq string that parses from here to the contributor id and title group is as follows.

String for jq(Poster id and title group)


'.[] | {user_id: .user.id, title: .title}'

After parsing, it will be as follows.

Poster id and title group.json


{
  "user_id": "wordijp",
  "title": "Curl with Go/Use jq library"
}
{
2nd article
}

By the way, when parsing while maintaining the array, it will be as follows, I don't use it at the time of implementation, but I will only introduce it because it is incidental.

String for jq(Poster id and title group, array maintenance Ver)


'. | [{user_id: .[].user.id, title: .[].title}]'

After parsing, it will be as follows.

Poster id and title group, array maintenance Ver.json


[
  {
    "user_id": "wordijp",
    "title": "Curl with Go/Use jq library"
  },
  {
2nd article
  }
]

Click here for Go source code

test_curl_json.go


//Call Qiita API v2 and list the poster ID and title of the posted article
package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	curl "github.com/andelf/go-curl"
	"sync/atomic"
	jq "github.com/mgood/go-jq" // <del>PR bug fix</del>fixed!
	//jq "github.com/wordijp/go-jq" //The original bug has been fixed so it is no longer needed
)

type wrdata struct {
	ch     chan []byte //For propagation of acquired data
	remain int32       //Remaining number of acquired data
	//For synchronization of parallel processing
	perform  chan int
	complete chan int
}

//Execute cURL data acquisition and return value processing in parallel
// NOTE :A little faster than sequential
func easyParallelWrite(easy *curl.CURL, fWrite func([]byte)) {

	// write function
	easy.Setopt(curl.OPT_WRITEFUNCTION, func(ptr []byte, userdata interface{}) bool {
		//println("ptr size:", len(ptr))
		wd, ok := userdata.(*wrdata)
		if !ok {
			println("ERROR!")
			return false
		}

		atomic.AddInt32(&wd.remain, 1)
		wd.ch <- ptr
		return true // ok
	})

	// write data
	wd := &wrdata{
		ch:       make(chan []byte, 100),
		remain:   0,
		perform:  make(chan int),
		complete: make(chan int),
	}
	var buf bytes.Buffer
	go func(wd *wrdata) {
		performed := false
	loop:
		for {
			if !performed {
				select {
				case <-wd.perform:
					performed = true
					//Data has already been acquired
					if atomic.LoadInt32(&wd.remain) <= 0 {
						break loop
					}
				default:
					// no-op
				}
			}

			data := <-wd.ch
			atomic.AddInt32(&wd.remain, -1)
			//println("Got data size=", len(data))
			buf.Write(data)
			// complete after performed
			if performed && atomic.LoadInt32(&wd.remain) <= 0 {
				break
			}
		}

		//println("recv finished!")
		wd.complete <- 1
	}(wd)
	easy.Setopt(curl.OPT_WRITEDATA, wd)

	easy.Perform()

	//Wait for data acquisition to complete
	wd.perform <- 1
	<-wd.complete

	//Returns the result
	fWrite(buf.Bytes())
}

//Helper function
//The curl process is summarized here
func curlExec(url string, fWrite func([]byte)) {
	curl.GlobalInit(curl.GLOBAL_DEFAULT)
	defer curl.GlobalCleanup()

	easy := curl.EasyInit()
	defer easy.Cleanup()

	easy.Setopt(curl.OPT_URL, url)
	//easy.Setopt(curl.OPT_SSL_VERIFYPEER, 0) //Enabled during SSL
	easyParallelWrite(easy, fWrite)
}

type Item struct {
	UserID string `json:"user_id"`
	Title  string `json:"title"`
}

func main() {
	curlExec("http://qiita.com/api/v2/items", func(buf []byte) {
		j, _ := jq.NewJQ(".[] | {user_id: .user.id, title: .title}")

		j.HandleJson(string(buf))
		for j.Next() {
			item := &Item{}
			valuejson := j.ValueJson()
			json.Unmarshal(([]byte)(valuejson), &item)

			fmt.Println(valuejson)                        // JSON
			fmt.Println(item.UserID + " : " + item.Title) //Structure
			fmt.Println()
		}
	})
}

Execution result


> go run test_curl_json.go
{"user_id":"kazuma1989","title":"Note the minimum configuration to run Jetty with Docker for the time being"}
kazuma1989 :Note the minimum configuration to run Jetty with Docker for the time being

{"user_id":"ironsand","title":"foo_Note that if you camelize BarHoge, it will become FooBarhoge."}
ironsand : foo_Note that if you camelize BarHoge, it will become FooBarhoge.

{"user_id":"Teach","title":"Flash jump with ThirdPersonController"}
Teach :Flash jump with ThirdPersonController
Abbreviation

I am using jq to extract the data I want from the acquired JSON, jq has such a powerful filtering function, I think it would be difficult to implement this without using jq, jq is convenient ..

Also, the easyParallelWrite function is parallelized by goroutine, so it is a little complicated to process the shared variable (remain) in a thread-safe manner, but I use it for performance purposes.

Recommended Posts

Use curl / jq library with Go
Use cryptography library cryptography with Docker Python image
[Go] Use Open ID Connect with go-oidc
Python with Go
use go module
Use mecab-ipadic-neologd with igo-python
Use RTX 3090 with PyTorch
Use ansible with cygwin
Call your own C library with Go using cgo
Use pipdeptree with virtualenv
[Python] Use JSON with Python
Use Mock with pytest
Use indicator with pd.merge
Use Gentelella with django
Use mecab with Python3
Use tensorboard with Chainer
Use DynamoDB with Python
Use pip with MSYS2
Use Python 3.8 with Anaconda
Use pyright with Spacemacs
Use TypeScript with django-compressor
Use MySQL with Django
Use Enums with SQLAlchemy
Use tensorboard with NNabla
Use GPS with Edison
Use nim with Jupyter
Use shared memory with shared libraries
Use "$ in" operator with mongo-go-driver
Use custom tags with PyYAML
Operate Db2 container with Go
Use directional graphs with networkx
Use TensorFlow with Intellij IDEA
Getting Started with Go Assembly
Use Twitter API with Python
Use pip with Jupyter Notebook
Use DATE_FORMAT with SQLAlchemy filter
Bit full search with Go
Use TUN / TAP with Python
Use sqlite3 with NAO (Pepper)
Connect to Postgresql with GO
Use sqlite load_extensions with Pyramid
Hot reload with Go + Air
Use Windows 10 fonts with WSL
Use chainer with Jetson TK1
Use SSL with Celery + Redis
Use Cython with Jupyter Notebook
[Go] How to use "... (3 periods)"
Use Maxout + CNN with Pylearn2
Try implementing perfume with Go
Use WDC-433SU2M2 with Manjaro Linux
Use OpenBLAS with numpy, scipy
Use subsonic API with python3
Use Sonicwall NetExtener with Systemd
Use prefetch_related conveniently with Django
Use AWS interpreter with Pycharm
Use Bokeh with IPython Notebook
Use Python-like range with Rust
I want to use an external library with IBM Cloud Functions