Is Serverless slow?

When I wrote Previous article, the response of the serverless environment I built myself was slow. There was a skin sensation. I imagined that it would be slower than a general server because the container was started using Docker internally. But how much does the response speed deteriorate? There aren't many comparative articles, so I decided to actually measure it.

How fast is AWS Lambda?

A brief survey of the technical field of serverless architecture

According to the article

I couldn't find any other data available, so here's a short comment by taka4sato5. According to it, the response time of the Lambda function via API Gateway is "250 msec at the earliest and 8000 msec at the slowest". *

Is written. This means that in a similar Serverless environment, it can be predicted that the response time will be about the same, or a little slower.

Serverless environment to be tested

We use a Serverless environment that runs on-premises called Iron Functions. Regarding this, I wrote an introductory article in the past, so please have a look there.

Make in 1 hour! Easy home Serverless environment! -Introduction to Serverless Application for the first time-

Roughly speaking, it is a convenient product that allows you to easily introduce a serverless environment like AWS Lambda.

benchmark

This time, we will use three languages for benchmarking: Go, Node.js, and Python. Write code that works almost the same in each language. Let's see how much difference there is when running them on Serverless and running on HTTP servers built in for each language (Native).

Go Serverless (IronFunctions + Go)
Go Native (Go 1.6.2)
Node.js Serverless (IronFunctions + Node.js)
Node.js Native (Node.js v8.1.2)
Python Serverless (IronFunctions + Python)
Python Native (Python 3.5.2)

Go Serverless

package main

import (
	"encoding/json"
	"fmt"
	"os"
)

type Person struct {
	Name string
}

func main() {
	p := &Person{Name: "World"}
	json.NewDecoder(os.Stdin).Decode(p)
	fmt.Printf("Hello %v!", p.Name)
}

Go Native

package main

import (
	"encoding/json"
	"fmt"
	"net/http"
)

type Person struct {
	Name string
}

func handler(w http.ResponseWriter, r *http.Request) {
	p := &Person{Name: "World"}
	json.NewDecoder(r.Body).Decode(p)
	fmt.Fprintf(w, "Hello %v!", p.Name)
}

func main() {
	http.HandleFunc("/", handler)
	http.ListenAndServe(":2000", nil)
}

Node.js Serverless

name = "World";
fs = require('fs');
try {
	obj = JSON.parse(fs.readFileSync('/dev/stdin').toString())
	if (obj.name != "") {
		name = obj.name
	}
} catch(e) {}
console.log("Hello", name, "from Node!");

Node.js Native

const http = require('http');
name = "World";

http.createServer(
    (req, res) => {
         var body = "";
         req.on(
            "data",
            (chunk) => { body+=chunk; }
         );

         req.on(
            "end",
            () => {
                 obj = JSON.parse(body);
                 res.writeHead(200, {'Content-Type': 'text/plain'});
                 res.end('Hello ' + obj.name + " from Node Native!");
            }
         );

    }
).listen(6000);

Python Serverless

import sys
sys.path.append("packages")
import os
import json

name = "World"
if not os.isatty(sys.stdin.fileno()):
	obj = json.loads(sys.stdin.read())
	if obj["name"] != "":
		name = obj["name"]

print "Hello", name, "!!!"

Python Native

from http.server import BaseHTTPRequestHandler,HTTPServer
from json import loads
from io   import TextIOWrapper

class Handler(BaseHTTPRequestHandler):
    def do_POST(self):
        content_length = int(self.headers.get('content-length'))
        text  = TextIOWrapper(self.rfile).read(content_length)

        self.send_response(200)
        self.send_header('Content-type','text/plain')
        self.end_headers()

        obj = loads(text)
        self.wfile.write("Hello {name} !! Welcome to Native Python World!!".format(name=obj["name"]).encode("utf-8"))

PORT = 1000
server =  HTTPServer(("127.0.0.1", PORT), Handler)

print("serving at port", PORT)
server.serve_forever()

Benchmark method

The server for each benchmark runs on the same machine. Ubuntu 16.04 on a virtual machine with 1 core and 2GB of memory. The server that puts the load and the server that receives the load are the same, and Apache Bench is used. Prepare the following json and prepare

`johnny.json`


{
    "name":"Johnny"
}

Load is applied by throwing Post processing with Apache Bench. The number of requests is 100 and the number of parallels is 5. (The number of requests is small will be described later) At that time, the time until the response is returned from the server (Response Time) is measured.

#XXXX/XXXX complements as appropriate
ab -n 100 -c 5 -p johnny.json -T "Content-Type: application/json" http://localhost:XXXXX/XXXXX

Benchmark results

Response Time	min[ms]	mean[ms]	std[ms]	median[ms]	max[ms]	Native ratio(mean)
Go Serverless	3951	6579	1010	6512	8692	1644.75
Go Native	0	4	5	2	37	-
Node Serverless	5335	14917	3147	15594	20542	621.54
Node Native	5	24	45	12	235	-
Python Serverless	5036	13455	4875	14214	29971	840.94
Python Native	6	16	4	16	26	-

** Please note that the vertical axis of the figure below is a logarithmic scale (the magnitude relationship is shown logarithmically for easy understanding) **

Consideration

As you can see from the table, with ** Go, the Serverless environment is more than 1600 times slower than the Native environment. You can see that other Node.js is 600 times slower and Python is 800 times slower. ** ** If you compare the results of Python and Node.js, you may find it strange that Python is faster. When I tried a follow-up test in the Native environment, Python was sometimes faster when the number of requests was small and the number of parallels was small. When the number of requests is 10,000 or more, Node.js can process more stably, and the processing finishes faster than Python. In addition, Python's Native implementation sometimes resulted in an error and the request could not be processed normally. Probably, Go, which has a speed difference with this number of requests, is tuned abnormally. Here, let's compare it with the above-mentioned "AWS Lambda has a processing time of 250ms to 8000ms". The result of this benchmark was a result with little discomfort personally. When I made a request to Iron Functions by myself with Curl, I felt "slow", and if I make Docker for each request, I think that there is no help for it. On the other hand, I had the impression that AWS Lambda's 250ms is very fast.

Tune Up AWS Lambda

Looking at, AWS Lambda seems to have two types of startup methods, cold start and warm start. The former is no different from the intuitive serverless, and the image is that "a container is created for each request", and the latter seems to be "a container once created is reused". In this way, the Lambda implementation is faster because it may not create a container. It seems that. I think that is the reason why we can respond in 250ms at the fastest. On the other hand, Iron Functions probably only implements cold start, so I think it's not that fast. However, when Go is run on Serverless in an environment created by myself, I think that Max of about 8600ms is a good processing speed. Of course, there is a difference in the number of clients processing, but the speed of container generation and disposal does not actually change that much. I thought.

Is AWS Lambda a good host?

Below is a link to the AWS Lambda price list.

AWS Lambda pricing https://aws.amazon.com/jp/lambda/pricing/

The price plan seems to be charged by usage time * memory usage, and it seems that you can not select the CPU. So how do you allocate the CPU? It was written in Help.

Q: How are compute resources allocated to AWS Lambda? AWS Lambda's resource model allocates CPU power and other resources proportionally when you specify the amount of memory required for a function. For example, specifying 256 MB of memory allocates about twice as much CPU power to your Lambda function. CPU power is doubled when compared to 128 MB of memory and halved when compared to 512 MB of memory. Memory can be increased from 128 MB to 1.5 GB in increments of 64 MB. *

And one more article, the following article

The story of the actual introduction of Serverless Framework

This time, the calculation was performed with an extremely small number of 30 times, so the difference was large, but under this condition, EC2 was used if the API is hit about 100,000 times or more per day. The configuration is cheaper. *

It was stating. I've done some serverless environment tests, but the CPU usage of the host machine goes up considerably. Therefore, I was thinking about how to pay on the host side (Amazon), but it seems that the Amazon side is set to have a high hourly unit price. Also, serverless is basically made so that the response can only be returned after starting and discarding the container, so I wonder if the CPU is set to a good one for high-speed response. I thought.

Benchmark supplement

I think that the number of benchmarks is usually 10,000 times, or some patterns are performed. However, in this experiment, the number of requests is limited to about 100. There are two reasons. One is because it is "slow". When running with Serverless, it takes about 4000ms at the fastest. Therefore, we did not benchmark large-scale requests because it was not realistic. The second is because it is "unstable". Iron Functions has some unstable behavior. Therefore, even if the request is about 100 times, it may fail about 10 times. Therefore, if you increase the number of parallels or the number of requests, there is a high probability that processing will not be possible. This also seems to depend on the life cycle of the Doquer container of Iron Functions, and it was a product that timed out or did not time out even if the same request was sent, and it was difficult to get an accurate value. Therefore, the data described in this article follows the processing time value itself. Rather, I think it is more accurate to recognize that there is an order for processing time compared to Native. Also, the fact that the machine that puts the load and the machine that receives the load are the same may be a slightly inaccurate benchmark. This is simply that I did not prepare two environments, but if you look at the speed difference between the Native implementation and the Serverless implementation, it may not be a problem if you keep the server state the same. .. I think.

Impressions

This benchmark took a very long time. Posting was delayed considerably because it took 3 to 4 hours to request about 600 times at most. And the slowness of Serverless has become a really striking result. Let's use it. I thought, but I should stop it for a while ... On the other hand, AWS Lambda is excellent ... And the speed of Go's http server is amazing. I never thought that even such a small bench would be fast. Also, Iron Functions has almost no Japanese know-how, so it's painful. Actually, there is a reverse proxy called fnlb, and a method of clustering by it is also officially prepared. Then it will be easier to scale. Although I think that, the operation by itself is too slow, so it may be essential to tune more or improve the bottleneck. Iron Functions itself is written in Go in the first place, so it shouldn't be that slow, but ... I wonder if it's around the docker container ... Hmm. The serverless road is a long way off.

Is the Serverless environment more than 600 times slower? ~ I tried benchmarking with Go, Node.js and Python! ~