Introduction

This article is about creating an HTTP server that supports the GET and POST methods.

It also means studying HTTP, but when I try to make it myself, I often think it's surprising and enjoy it, so I'm writing an article that I want to recommend to various people.

Assumed reader

--People involved in WEB development --People who have somehow seen the request and response message with the developer tools of the browser --You can make an HTTP server! ?? Who thought

Source code

The article is quite broken, so if you want to see the whole thing, please click here. https://github.com/ksugimori/SimpleHttpServer

It has a little less functionality than the Java version, but I also make a Ruby version. https://github.com/ksugimori/simple_http_server

specification

Functions to implement

Since it is created only for studying HTTP, it can display static pages.

--Corresponds to the GET method --Can display static pages --Access to a non-existent resource returns a response code 404 and transitions to an error page --Corresponds to the POST method ――For the time being, until you receive the request --It's boring to do nothing, so the POSTed content is output to standard output.

What should i do?

When displaying a WEB page, the following processing is performed in a very simple way.

Receive an HTTP request message from the client
Parse the request
Read the requested resource
Create an HTTP response message
Send the response to the client

In the following, we will implement it according to these 5 steps.

Implementation

Step 0: HTTP message class

I said "follow 5 steps ...", but before that, create a class that represents the request / response message that is the basis of HTTP.

First check the specifications → RFC7230

`Format the entire message`


HTTP-message   = start-line
                 *( header-field CRLF )
                 CRLF
                 [ message-body ]

start-line     = request-line / status-line

--The first line is Start Line. It is called Request Line for request and Status Line for response, and each has a different format. --Header continues from the second line separated by CRLF --Body with a blank line in between

`request-line/status-line format`


request-line = Method SP Request-URI SP HTTP-Version CRLF

status-line  = HTTP-version SP status-code SP reason-phrase CRLF

`request-line/status-line example`


GET /path/to/something HTTP/1.1

HTTP/1.1 200 OK

The structure is the same except for Start Line, so let's create an abstract base class and inherit it.

`AbstractHttpMessage.java`


public abstract class AbstractHttpMessage {
  protected Map<String, String> headers;
  protected byte[] body;

  public AbstractHttpMessage() {
    this.headers = new HashMap<>();
    this.body = new byte[0];
  }

  public void addHeaderField(String name, String value) {
    this.headers.put(name, value);
  }

  public Map<String, String> getHeaders() {
    return headers;
  }

  public void setBody(byte[] body) {
    this.body = body;
  }

  public byte[] getBody() {
    return body;
  }

  protected abstract String getStartLine();

  @Override
  public String toString() {
    return getStartLine() + " headers: " + headers + " body: " + new String(body, StandardCharsets.UTF_8);
  }
}

`Request.java`


public class Request extends AbstractHttpMessage {
  Method method;
  String target;
  String version;

  public Request(Method method, String target, String version) {
    super();
    this.method = method;
    this.target = target;
    this.version = version;
  }

  public Method getMethod() {
    return method;
  }

  public String getTarget() {
    return target;
  }

  public String getVersion() {
    return version;
  }

  @Override
  public String getStartLine() {
    return method.toString() + " " + target + " " + version;
  }
}

`Response.java`


public class Response extends AbstractHttpMessage {
  String version;
  Status status;

  public Response(String version, Status status) {
    super();
    this.version = version;
    this.status = status;
  }

  public String getVersion() {
    return version;
  }

  public int getStatusCode() {
    return status.getCode();
  }

  public String getReasonPhrase() {
    return status.getReasonPhrase();
  }

  @Override
  public String getStartLine() {
    return version + " " + getStatusCode() + " " + getReasonPhrase();
  }
}

Step 1: Receive the request

Although it is an HTTP server, it basically only performs normal socket communication. The main method just waits for the connection, and when the connection is established, it is processed by another thread.

`python`


public static void main(String[] args) {
  ServerSocket server = new ServerSocket(8080);
  ExecutorService executor = Executors.newCachedThreadPool();

  while (true) {
    Socket socket = server.accept();

    //Pass the socket object and process each request in a separate thread
    executor.submit( new WorkerThread(socket) );
  }
}

Step 2: Parse the request

Request-Line

`format`


request-line = Method SP Request-URI SP HTTP-Version CRLF

Since it is a space-separated string, pattern match it with a regular expression and extract the HTTP method, URI, and HTTP version.

`Request-Line perspective`


InputStream in = socket.getInputStream();

BufferedReader br = new BufferedReader(new InputStreamReader(in));
String requestLine = br.readLine();

Pattern requestLinePattern
     = Pattern.compile("^(?<method>\\S+) (?<target>\\S+) (?<version>\\S+)$");
Matcher matcher = requestLinePattern.matcher(requestLine);

Method method = Method.valueOf(matcher.group("method"));
String target = matcher.group("target");
String version = matcher.group("version");

Request request = new Request(method, target, version);

header

`format`


header-field = field-name ":" OWS field-value OWS

Since the header is a set of field name and value separated by :, this is also extracted with a regular expression. Even if the body is 0 bytes, there is always a blank line that separates the header from the body, so read it as a header until it encounters a blank line. [^ 1]

ʻOWD` is "optional white space". It seems that 0 or 1 half-width space / tab character is entered → 3.2.3. Whitespace

Pattern headerPattern = Pattern.compile("^(?<name>\\S+):[ \\t]?(?<value>.+)[ \\t]?$");
while ( true ) {
  String headerField = br.readLine();
  if ( EMPTY.equals(headerField.trim()) ) break; //Read up to the header and body delimiter

  Matcher matcher = headerPattern.matcher(headerField);

  if (matcher.matches()) {
    request.addHeaderField(matcher.group("name"), matcher.group("value"));
  } else {
    throw new ParseException(headerField);
  }
}

body

The Content-Length or Transfer-Encoding header is always specified if the body is present. In other words, it should be possible to conditionally branch based on the request header and support the following 3 patterns.

With Content-Length
With Transfer-Encoding
Neither exists

1. With Content-Length

Since it is a simple format of message-body = * OCTET, it is OK to simply store the sent contents in an array of byte type.

The number of bytes to read is determined based on Content-Length.

Integer contentLength = Integer.valueOf(request.getHeaders().get("Content-Length"));

char[] body = new char[contentLength];
bufferedReader.read(body, 0, contentLength);

request.setBody((new String(body)).getBytes());

2. With Transfer-Encoding

If Content-Length is required, the client cannot send the request until the request body is created, which is inefficient. Therefore, HTTP / 1.1 makes it possible to send chunked request bodies as follows. [^ 3]

`python`


POST /hoge HTTP/1.1
Host: example.jp
Transfer-Encoding: chunked
Connection: Keep-Alive

a
This is a 
a
test messa
3
ge.
0

`Chunked body format`


Chunk size (bytes, hexadecimal) CRLF
Chunked data CRLF

Even if you don't know the number of bytes in the entire body, it is a strategy to add the number of bytes in order from the prepared part and send it. The client must send a 0-byte chunk at the end because there is no Content-Length and the end of the entire body cannot be determined on the server side.

String transferEncoding = request.getHeaders().get("Transfer-Encoding");
// "Transfer-Encoding: gzip, chunked"It is possible to specify more than one like, but this time only chunked is supported
if (transferEncoding.equals("chunked")) { 
  int length = 0;
  ByteArrayOutputStream body = new ByteArrayOutputStream();
  String chunkSizeHex = br.readLine().replaceFirst(" .*$", ""); // ignore chunk-ext
  int chunkSize = Integer.parseInt(chunkSizeHex, 16);
  while (chunkSize > 0) {
    char[] chunk = new char[chunkSize];
    br.read(chunk, 0, chunkSize);
    br.skip(2); //CRLF minutes
    body.write((new String(chunk)).getBytes());
    length += chunkSize;

    chunkSizeHex = br.readLine().replaceFirst(" .*$", "");
    chunkSize = Integer.parseInt(chunkSizeHex, 16);
 }

 request.addHeaderField("Content-Length", Integer.toString(length));
 request.getHeaders().remove("Transfer-Encoding");
 request.setBody(body.toByteArray());
}

3. Neither Transfer-Encoding nor Content-Length exists

In this case the request body should not exist so do nothing

Step 3: Read the requested resource

For the GET method

Simply read the requested file as a byte array with Files # readAllBytes and it's OK.

//Connect to the document root to the actual file path
Path target = Paths.get(SimpleHttpServer.getDocumentRoot(), request.getTarget()).normalize();

//Make it accessible only below the document root
if (!target.startsWith(SimpleHttpServer.getDocumentRoot())) {
  return new Response(protocolVersion, Status.BAD_REQUEST);
}

if (Files.isDirectory(target)) {
  target = target.resolve("index.html");
}

try {
  response = new Response("HTTP/1.1", Status.OK);
  response.setBody(Files.readAllBytes(target)); //The file is stored as an array of bytes
} catch (IOException e) {
  //Set the error code if it does not exist and read the HTML file for the error page
  response = new Response("HTTP/1.1", Status.NOT_FOUND);
  response.setBody(SimpleHttpServer.readErrorPage(Status.NOT_FOUND));
}

For the POST method

In the case of POST, output to standard output to check if the body is read properly for the time being. The response code should be 204: No Content to signal successful completion.

System.out.println("POST body: " + new String(request.getBody(), StandardCharsets.UTF_8));
Response response = new Response(protocolVersion, Status.NO_CONTENT);

Step 4: Create an HTTP response message

If you specify Content-Type in the header according to the file format, the client side will take care of the rest. [^ 2]

response = new Response(protocolVersion, Status.OK);
response.setBody(Files.readAllBytes(target));

Map<String, String> mimeTypes = new HashMap<>();
mimeTypes.put("html", "text/html");
mimeTypes.put("css", "text/css");
mimeTypes.put("js", "application/js");
mimeTypes.put("png", "image/png");

String ext = StringUtils.getFileExtension(target.getFileName().toString());
String contentType = mimeTypes.getOrDefault(ext, "");
      
response.addHeaderField("Content-Type", contentType);

Step 5: Send the response to the client

The format of the response message is

HTTP-version SP status-code SP reason-phrase CRLF
*( header-field CRLF )
CRLF
[ message-body ]

So, I honestly write it to the socket according to the format.

 BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(out));

  String statusLine 
     = resp.getVersion() + SP + resp.getStatusCode() + SP + resp.getReasonPhrase() + CRLF;
  writer.write(statusLine);

  for (Map.Entry<String, String> field : response.getHeaders().entrySet()) {
    writer.write(field.getKey() + ":" + SP + field.getValue() + CRLF);
  }
  writer.write(CRLF); //Blank line required to separate header and body
  writer.flush();

  out.write(response.getBody()); //The body writes the byte array read from the file as it is

Try to chunk the response

In this program, the response message is sent to the client side after it is completed, so it is not worth making it into chunk format, but I will make it because it is a big deal.

`Chunked body format`


Chunk size (bytes, hexadecimal) CRLF
Chunked data CRLF

Divide the byte type array that represents the entire body into CHUNK_SIZE units and format it into the above format.

byte[] CRLF = new byte[] {0x0D, 0x0A};

byte[] body = response.getBody();
ByteArrayOutputStream out = new ByteArrayOutputStream();

for (int offset = 0; offset < body.length; offset += CHUNK_SIZE) {
  byte[] chunk = Arrays.copyOfRange(body, offset, offset + CHUNK_SIZE);
  String lengthHex = Integer.toHexString(chunk.length);
  out.write(lengthHex.getBytes());
  out.write(CRLF);
  out.write(chunk);
  out.write(CRLF);
}
out.write("0".getBytes());
out.write(CRLF); //Chunk size line end
out.write(CRLF); //Chunk data of size 0

Try to move

I use the Request object and the Response object to output an Apache-like access log just before returning a response to the client.

top page

Other pages

Submit form

Since it is written to the standard output without doing anything in particular, the half-width space is percent-encoded to +, but you can see that the contents of the form are received.

Non-existent path

The status code is 404, but since I put the HTML file for the error page in the body, I can transition to the error screen.

Chunked response

It was confirmed that the body was divided into 20 (14 in hexadecimal) bytes.

$ curl localhost:8080/chunked/sample.txt --trace-ascii /dev/stdout
== Info:   Trying ::1...
== Info: TCP_NODELAY set
== Info: Connected to localhost (::1) port 8080 (#0)
=> Send header, 96 bytes (0x60)
0000: GET /chunked/sample.txt HTTP/1.1
0022: Host: localhost:8080
0038: User-Agent: curl/7.54.0
0051: Accept: */*
005e:
<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
<= Recv header, 28 bytes (0x1c)
0000: Transfer-Encoding: chunked
<= Recv header, 26 bytes (0x1a)
0000: Content-Type: text/plain
<= Recv header, 2 bytes (0x2)
0000:
<= Recv data, 109 bytes (0x6d)
0000: 14
0004: This is a sample tex
001a: 14
001e: t..Response is chunk
0034: 14
0038: ed in every 20 bytes
004e: 14
0052: ....................
0068: 0
006b:
This is a sample text.
Response is chunked in every 20 bytes.

I knew when I was trying this, but if it is Transfer-Encoding: chunked and Content-Type: text / plain, it will automatically download the file when opened in a browser. I thought it was just application / octet-stream`.

reference

RFC 7230 [How to learn a new programming language Java, Scala, Clojure to learn by creating an HTTP server](https://speakerdeck.com/todokr/xin-siihurokuraminkuyan-yu-falsexue-hifang-httpsahawozuo-tutexue-hu-java-scala- clojure) : arrow_up: This slide inspired me to create an HTTP server.

[^ 1]: With this, if the client throws a strange request, I'm addicted to an infinite loop, so I don't think it's cool. .. [^ 2]: It should be added unless there is a particular reason, but it seems that it is not necessary to add it because it is "SHOULD" instead of "MUST" → 3.1.1.5. Content-Type /specs/rfc7231.html#header.content-type) [^ 3]: A little simplified. Click here for details → 4.1. Chunked Transfer Coding

Make your own simple server in Java and understand HTTP

Introduction

Assumed reader

Source code

specification

Functions to implement

What should i do?

Implementation

Step 0: HTTP message class

Format the entire message

request-line/status-line format

request-line/status-line example

AbstractHttpMessage.java

Request.java

Response.java

Step 1: Receive the request

python

Step 2: Parse the request

format

Request-Line perspective

header

format

body

1. With Content-Length

2. With Transfer-Encoding

python

Chunked body format

3. Neither Transfer-Encoding nor Content-Length exists

Step 3: Read the requested resource

For the GET method

For the POST method

Step 4: Create an HTTP response message

Step 5: Send the response to the client

Try to chunk the response

Chunked body format

Try to move

top page

Other pages

Submit form

Non-existent path

Chunked response

reference

`Format the entire message`

`request-line/status-line format`

`request-line/status-line example`

`AbstractHttpMessage.java`

`Request.java`

`Response.java`

`python`

`format`

`Request-Line perspective`

`format`

`python`

`Chunked body format`

`Chunked body format`