This article is about creating an HTTP server that supports the GET and POST methods.
It also means studying HTTP, but when I try to make it myself, I often think it's surprising and enjoy it, so I'm writing an article that I want to recommend to various people.
--People involved in WEB development --People who have somehow seen the request and response message with the developer tools of the browser --You can make an HTTP server! ?? Who thought
The article is quite broken, so if you want to see the whole thing, please click here. https://github.com/ksugimori/SimpleHttpServer
It has a little less functionality than the Java version, but I also make a Ruby version. https://github.com/ksugimori/simple_http_server
Since it is created only for studying HTTP, it can display static pages.
--Corresponds to the GET method --Can display static pages --Access to a non-existent resource returns a response code 404 and transitions to an error page --Corresponds to the POST method ――For the time being, until you receive the request --It's boring to do nothing, so the POSTed content is output to standard output.
When displaying a WEB page, the following processing is performed in a very simple way.
In the following, we will implement it according to these 5 steps.
I said "follow 5 steps ...", but before that, create a class that represents the request / response message that is the basis of HTTP.
First check the specifications → RFC7230
Format the entire message
HTTP-message = start-line
*( header-field CRLF )
CRLF
[ message-body ]
start-line = request-line / status-line
--The first line is Start Line. It is called Request Line for request and Status Line for response, and each has a different format. --Header continues from the second line separated by CRLF --Body with a blank line in between
request-line/status-line format
request-line = Method SP Request-URI SP HTTP-Version CRLF
status-line = HTTP-version SP status-code SP reason-phrase CRLF
request-line/status-line example
GET /path/to/something HTTP/1.1
HTTP/1.1 200 OK
The structure is the same except for Start Line, so let's create an abstract base class and inherit it.
AbstractHttpMessage.java
public abstract class AbstractHttpMessage {
protected Map<String, String> headers;
protected byte[] body;
public AbstractHttpMessage() {
this.headers = new HashMap<>();
this.body = new byte[0];
}
public void addHeaderField(String name, String value) {
this.headers.put(name, value);
}
public Map<String, String> getHeaders() {
return headers;
}
public void setBody(byte[] body) {
this.body = body;
}
public byte[] getBody() {
return body;
}
protected abstract String getStartLine();
@Override
public String toString() {
return getStartLine() + " headers: " + headers + " body: " + new String(body, StandardCharsets.UTF_8);
}
}
Request.java
public class Request extends AbstractHttpMessage {
Method method;
String target;
String version;
public Request(Method method, String target, String version) {
super();
this.method = method;
this.target = target;
this.version = version;
}
public Method getMethod() {
return method;
}
public String getTarget() {
return target;
}
public String getVersion() {
return version;
}
@Override
public String getStartLine() {
return method.toString() + " " + target + " " + version;
}
}
Response.java
public class Response extends AbstractHttpMessage {
String version;
Status status;
public Response(String version, Status status) {
super();
this.version = version;
this.status = status;
}
public String getVersion() {
return version;
}
public int getStatusCode() {
return status.getCode();
}
public String getReasonPhrase() {
return status.getReasonPhrase();
}
@Override
public String getStartLine() {
return version + " " + getStatusCode() + " " + getReasonPhrase();
}
}
Although it is an HTTP server, it basically only performs normal socket communication. The main method just waits for the connection, and when the connection is established, it is processed by another thread.
python
public static void main(String[] args) {
ServerSocket server = new ServerSocket(8080);
ExecutorService executor = Executors.newCachedThreadPool();
while (true) {
Socket socket = server.accept();
//Pass the socket object and process each request in a separate thread
executor.submit( new WorkerThread(socket) );
}
}
Request-Line
format
request-line = Method SP Request-URI SP HTTP-Version CRLF
Since it is a space-separated string, pattern match it with a regular expression and extract the HTTP method, URI, and HTTP version.
Request-Line perspective
InputStream in = socket.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String requestLine = br.readLine();
Pattern requestLinePattern
= Pattern.compile("^(?<method>\\S+) (?<target>\\S+) (?<version>\\S+)$");
Matcher matcher = requestLinePattern.matcher(requestLine);
Method method = Method.valueOf(matcher.group("method"));
String target = matcher.group("target");
String version = matcher.group("version");
Request request = new Request(method, target, version);
format
header-field = field-name ":" OWS field-value OWS
Since the header is a set of field name and value separated by :
, this is also extracted with a regular expression.
Even if the body is 0 bytes, there is always a blank line that separates the header from the body, so read it as a header until it encounters a blank line. [^ 1]
Pattern headerPattern = Pattern.compile("^(?<name>\\S+):[ \\t]?(?<value>.+)[ \\t]?$");
while ( true ) {
String headerField = br.readLine();
if ( EMPTY.equals(headerField.trim()) ) break; //Read up to the header and body delimiter
Matcher matcher = headerPattern.matcher(headerField);
if (matcher.matches()) {
request.addHeaderField(matcher.group("name"), matcher.group("value"));
} else {
throw new ParseException(headerField);
}
}
The Content-Length
or Transfer-Encoding
header is always specified if the body is present. In other words, it should be possible to conditionally branch based on the request header and support the following 3 patterns.
Since it is a simple format of message-body = * OCTET
, it is OK to simply store the sent contents in an array of byte
type.
The number of bytes to read is determined based on Content-Length
.
Integer contentLength = Integer.valueOf(request.getHeaders().get("Content-Length"));
char[] body = new char[contentLength];
bufferedReader.read(body, 0, contentLength);
request.setBody((new String(body)).getBytes());
If Content-Length
is required, the client cannot send the request until the request body is created, which is inefficient. Therefore, HTTP / 1.1 makes it possible to send chunked request bodies as follows. [^ 3]
python
POST /hoge HTTP/1.1
Host: example.jp
Transfer-Encoding: chunked
Connection: Keep-Alive
a
This is a
a
test messa
3
ge.
0
Chunked body format
Chunk size (bytes, hexadecimal) CRLF
Chunked data CRLF
Even if you don't know the number of bytes in the entire body, it is a strategy to add the number of bytes in order from the prepared part and send it. The client must send a 0-byte chunk at the end because there is no Content-Length
and the end of the entire body cannot be determined on the server side.
String transferEncoding = request.getHeaders().get("Transfer-Encoding");
// "Transfer-Encoding: gzip, chunked"It is possible to specify more than one like, but this time only chunked is supported
if (transferEncoding.equals("chunked")) {
int length = 0;
ByteArrayOutputStream body = new ByteArrayOutputStream();
String chunkSizeHex = br.readLine().replaceFirst(" .*$", ""); // ignore chunk-ext
int chunkSize = Integer.parseInt(chunkSizeHex, 16);
while (chunkSize > 0) {
char[] chunk = new char[chunkSize];
br.read(chunk, 0, chunkSize);
br.skip(2); //CRLF minutes
body.write((new String(chunk)).getBytes());
length += chunkSize;
chunkSizeHex = br.readLine().replaceFirst(" .*$", "");
chunkSize = Integer.parseInt(chunkSizeHex, 16);
}
request.addHeaderField("Content-Length", Integer.toString(length));
request.getHeaders().remove("Transfer-Encoding");
request.setBody(body.toByteArray());
}
In this case the request body should not exist so do nothing
Simply read the requested file as a byte array with Files # readAllBytes
and it's OK.
//Connect to the document root to the actual file path
Path target = Paths.get(SimpleHttpServer.getDocumentRoot(), request.getTarget()).normalize();
//Make it accessible only below the document root
if (!target.startsWith(SimpleHttpServer.getDocumentRoot())) {
return new Response(protocolVersion, Status.BAD_REQUEST);
}
if (Files.isDirectory(target)) {
target = target.resolve("index.html");
}
try {
response = new Response("HTTP/1.1", Status.OK);
response.setBody(Files.readAllBytes(target)); //The file is stored as an array of bytes
} catch (IOException e) {
//Set the error code if it does not exist and read the HTML file for the error page
response = new Response("HTTP/1.1", Status.NOT_FOUND);
response.setBody(SimpleHttpServer.readErrorPage(Status.NOT_FOUND));
}
In the case of POST, output to standard output to check if the body is read properly for the time being. The response code should be 204: No Content to signal successful completion.
System.out.println("POST body: " + new String(request.getBody(), StandardCharsets.UTF_8));
Response response = new Response(protocolVersion, Status.NO_CONTENT);
If you specify Content-Type
in the header according to the file format, the client side will take care of the rest. [^ 2]
response = new Response(protocolVersion, Status.OK);
response.setBody(Files.readAllBytes(target));
Map<String, String> mimeTypes = new HashMap<>();
mimeTypes.put("html", "text/html");
mimeTypes.put("css", "text/css");
mimeTypes.put("js", "application/js");
mimeTypes.put("png", "image/png");
String ext = StringUtils.getFileExtension(target.getFileName().toString());
String contentType = mimeTypes.getOrDefault(ext, "");
response.addHeaderField("Content-Type", contentType);
The format of the response message is
HTTP-version SP status-code SP reason-phrase CRLF
*( header-field CRLF )
CRLF
[ message-body ]
So, I honestly write it to the socket according to the format.
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(out));
String statusLine
= resp.getVersion() + SP + resp.getStatusCode() + SP + resp.getReasonPhrase() + CRLF;
writer.write(statusLine);
for (Map.Entry<String, String> field : response.getHeaders().entrySet()) {
writer.write(field.getKey() + ":" + SP + field.getValue() + CRLF);
}
writer.write(CRLF); //Blank line required to separate header and body
writer.flush();
out.write(response.getBody()); //The body writes the byte array read from the file as it is
In this program, the response message is sent to the client side after it is completed, so it is not worth making it into chunk format, but I will make it because it is a big deal.
Chunked body format
Chunk size (bytes, hexadecimal) CRLF
Chunked data CRLF
Divide the byte type array that represents the entire body into CHUNK_SIZE
units and format it into the above format.
byte[] CRLF = new byte[] {0x0D, 0x0A};
byte[] body = response.getBody();
ByteArrayOutputStream out = new ByteArrayOutputStream();
for (int offset = 0; offset < body.length; offset += CHUNK_SIZE) {
byte[] chunk = Arrays.copyOfRange(body, offset, offset + CHUNK_SIZE);
String lengthHex = Integer.toHexString(chunk.length);
out.write(lengthHex.getBytes());
out.write(CRLF);
out.write(chunk);
out.write(CRLF);
}
out.write("0".getBytes());
out.write(CRLF); //Chunk size line end
out.write(CRLF); //Chunk data of size 0
I use the Request
object and the Response
object to output an Apache-like access log just before returning a response to the client.
Since it is written to the standard output without doing anything in particular, the half-width space is percent-encoded to +
, but you can see that the contents of the form are received.
The status code is 404, but since I put the HTML file for the error page in the body, I can transition to the error screen.
It was confirmed that the body was divided into 20 (14 in hexadecimal) bytes.
$ curl localhost:8080/chunked/sample.txt --trace-ascii /dev/stdout
== Info: Trying ::1...
== Info: TCP_NODELAY set
== Info: Connected to localhost (::1) port 8080 (#0)
=> Send header, 96 bytes (0x60)
0000: GET /chunked/sample.txt HTTP/1.1
0022: Host: localhost:8080
0038: User-Agent: curl/7.54.0
0051: Accept: */*
005e:
<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
<= Recv header, 28 bytes (0x1c)
0000: Transfer-Encoding: chunked
<= Recv header, 26 bytes (0x1a)
0000: Content-Type: text/plain
<= Recv header, 2 bytes (0x2)
0000:
<= Recv data, 109 bytes (0x6d)
0000: 14
0004: This is a sample tex
001a: 14
001e: t..Response is chunk
0034: 14
0038: ed in every 20 bytes
004e: 14
0052: ....................
0068: 0
006b:
This is a sample text.
Response is chunked in every 20 bytes.
Transfer-Encoding: chunked
and Content-Type: text / plain
, it will automatically download the file when opened in a browser. I thought it was just application / octet-stream`.RFC 7230 [How to learn a new programming language Java, Scala, Clojure to learn by creating an HTTP server](https://speakerdeck.com/todokr/xin-siihurokuraminkuyan-yu-falsexue-hifang-httpsahawozuo-tutexue-hu-java-scala- clojure) : arrow_up: This slide inspired me to create an HTTP server.
[^ 1]: With this, if the client throws a strange request, I'm addicted to an infinite loop, so I don't think it's cool. .. [^ 2]: It should be added unless there is a particular reason, but it seems that it is not necessary to add it because it is "SHOULD" instead of "MUST" → 3.1.1.5. Content-Type /specs/rfc7231.html#header.content-type) [^ 3]: A little simplified. Click here for details → 4.1. Chunked Transfer Coding
Recommended Posts