[PYTHON] Book updated: Evolving into a "decent web server"

Updated the book

Chapter: Evolving into a "decent web server" The has been updated.

If you want to read more, please like the book or follow the author ;-)


The following is an excerpt of the contents of the book.


How to become a decent web server?

In this chapter, already made by everyone "Henachoko Web We will evolve the "server" into a "decent web server".

So, let's sort out what exactly the Web server you created was a mess.

In the first place, a Web server was a server that communicates according to ** HTTP rules **. Conversely, in order to be a good web server, it must be a server that can return responses according to ** HTTP rules **.

However, your Henachoko web server does not adhere to the rules of HTTP.

Specifically, let's look again at the response (= server_send.txt) returned by your server.


server_send.txt

HTTP/1.1 200 OK
Date: Wed, 28 Oct 2020 07:57:45 GMT
Server: Apache/2.4.41 (Unix)
Content-Location: index.html.en
Vary: negotiate
TCN: choice
Last-Modified: Thu, 29 Aug 2019 05:05:59 GMT
ETag: "2d-5913a76187bc0"
Accept-Ranges: bytes
Content-Length: 45
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html

<html><body><h1>It works!</h1></body></html>


Your web server ** will return this content as a fixed response no matter what path the request comes in. ** **

Looking at it again, it seems that the HTTP format is being protected in the first place. There is a response line on the first line, some headers from the second line, and a body after the blank line. I borrowed the Apache response from the beginning, so this is natural.

It's not cool that the body is "It works!" No matter what path you request, but it's not a rule violation. Let's insist that "we are such a service."

** The problem is in the header. ** **

For example, according to RFC7231 7.1.1.2, the Date header describes the date and time when the web server sent the response. It is supposed to be done. However, it is currently fixed regardless of the response generation date and time. (In the above example, Date is fixed to2020/10/28 7:57:45)

Also, the Server header (RFC7231 7.4.2) contains information about the program that generated the response. It is supposed to be done, and generally the name of the web server etc. is described. However, we didn't create a web server named Apache, but it's fixed atServer: Apache / 2.4.41 (Unix).

(It's not strictly a rule violation, as the web server name is free to be given by the server developer, but it's not a "decent" web server to deceive someone else's program name. .)


So, in order to evolve into a "decent web server", let's improve the program so that we can arrange these headers and generate a response that properly follows the HTTP rules.

** As a first step, look at the headers that Apache is returning one by one to see if they need to be reworked. ** **

By the way, the only HTTP headers that are required are the Host headers in the request, and there are no required headers in the response. Therefore, this step removes things that are difficult to implement and learn.

** The next step is to actually modify the program and improve it so that the appropriate headers can be included in the response. ** **

Check the Apache response header

Let's take a look at the headers included in the Apache response, select the ones you need, and check the details of the modifications. The content will be a little more, so if you are not interested in the details, you can skip it.

Date

Date: Wed, 28 Oct 2020 07:57:45 GMT

RFC: https://triple-underscore.github.io/RFC7231-ja.html#header.date

As we've already seen, let's first look at the Date header. Date represents the date and time when the response was generated.

Right now, a fixed date and time is returned, but in Python it is easy to get the date and time using the datetime module, so let's return the ** date and time when the response was generated ** properly.

Server

Server: Apache/2.4.41 (Unix)

RFC: https://triple-underscore.github.io/RFC7231-ja.html#header.server

As we've already seen, the Server header returns information about the program that generated the response. The content is not specified, but it is said that you should not write too detailed information, and it is common to keep it to the server name or OS name.

Right now it's fixed at Apache / 2.4.41 (Unix), but let's give it your own original name.

In this document, we will return ver.0.1 of Henachoko Web Server, ** HenaServer / 0.1 ** for short.

Please give your own name.

Content-Location

Content-Location: index.html.en

RFC: https://triple-underscore.github.io/RFC7231-ja.html#header.content-location

Content-Location indicates an alternative URL to get the returned response.

This may be a little difficult to understand.

In this case, Chrome goes through a process called content negotiation when requesting a resource called /.

Specifically, Chrome is in an HTTP request

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: ja-JP,ja;q=0.9,en-US;q=0.8,en;q=0.7

Through headers such as "This is the format of the content that I can understand, this is the supported compression format, and I want the language most to be Japanese, but English is fine." I am telling you that.

The server side is generating an appropriate response according to its contents.

This collaborative work is called content negotiation, which means that the content can change depending on how you request it.

for that reason, "If you want the same content as the response returned this time, go to /index.html.en instead of /. " That's what Apache is telling us.

The server we'll create in this book won't be that complicated, so we'll ** not return this header **.

Vary

Vary: negotiate

RFC: https://triple-underscore.github.io/RFC7231-ja.html#header.vary

The Vary header is a header that controls whether the browser or intermediate server uses the cache. As long as the headers listed in this header do not change, it means that you can use the cache.

In this case, the server is telling you that you can reuse the cached content as long as there is no change in the content negotiation header described earlier.

Since cache control is not performed in this document, this header will be ** not returned **.

TCN

TCN: choice

RFC: https://tools.ietf.org/html/rfc2295#section-8.5

This is a slightly minor header, found in another RFC 2295 about Transparent Content Negotiation.

It is a header that tells how content negotiation was done, but it will be complicated, so I will omit the explanation here.

Content negotiation is not done in this document, so we will ** not return this header **.

Last-Modified

Last-Modified: Thu, 29 Aug 2019 05:05:59 GMT

RFC: https://triple-underscore.github.io/RFC7232-ja.html#header.last-modified

The Last-Modified header returns the date and time when the content was last modified. It is said that this header should be returned if a consistent last modified date and time can be returned.

In addition, the situation that "consistent last modification date and time cannot be returned" means that the content is the same even though the Last-Modified is the same, or the content is the same even though the Last-Modified is different for URLs that have different contents each time. It refers to the case where there is no meaningful Last-Modified value.

In this document, the last modification date and time may or may not be meaningful for each URL, so for the sake of simplicity, this header will be ** not returned **.

ETag

ETag: "2d-5913a76187bc0"

RFC: https://triple-underscore.github.io/RFC7232-ja.html#header.etag

The ETag header is an identifier that indicates a particular version of the resource that will generate the response. That is, if the resource is updated in any way, the ETag is expected to have a different value. Often, hash values for files and content are used.

This is also used for cache control of browsers and intermediate servers, but since cache control is not dealt with in this document, this header will be ** not returned **.

Accept-Ranges

Accept-Ranges: bytes

RFC: https://triple-underscore.github.io/RFC7233-ja.html#header.accept-ranges

The Accept-Ranges header is a header that indicates that it corresponds to a" partial request for a resource "called Range Requests.

Range Requests is a function that enables split download when downloading a large file.

This header does not correspond to Range Requests in this document, so this header will be ** not returned **.

Content-Length

Content-Length: 45

RFC: https://triple-underscore.github.io/RFC7230-ja.html#header.content-length

The Content-Length header returns a decimal value that indicates the number of bytes in the response body.

This means that the server should be returned, with some exceptions.

It's not difficult to get the number of bytes in python, so I'll make sure to ** return the number of bytes in the body **.

Keep-Alive

Keep-Alive: timeout=5, max=100

RFC: https://tools.ietf.org/id/draft-thomson-hybi-http-timeout-01.html#keep-alive

The Keep-Alive header returns information about how long the connection can be reused for the connection reuse described below.

This header has been submitted as a draft RFC (experimental specification) and is supported by almost all modern browsers, but has not yet been incorporated as a formal RFC standard specification.

This header is ** not returned ** as this document does not implement connection reuse.

Connection

Connection: Keep-Alive

The Connection header returns whether the TCP connection once established can be reused in the next request.

It takes some time to establish a TCP connection, and it is known that reuse of a TCP connection is effective when optimizing the display speed.

The connection reuse function is outside the scope of this document and will not be implemented.

However, in HTTP / 1.1, communication is supposed to reuse the connection by default, and ** Servers that do not support reuse of the connection must return Connection: Close. Since it is, I will return this in this book. ** **

Content-Type

Content-Type: text/html

RFC: https://triple-underscore.github.io/RFC7231-ja.html#header.content-type

The Content-Type header returns the format of the response body. The values that can be used are the values called MIME-Type.

And so on.

If you want to check the list, this site will be helpful.

If you omit this header, it will be treated as an "unidentified file" and may not be displayed on the browser screen, so return the one that matches the contents properly.

In this book, only HTML will be returned as the body in the next step, so ** First, we will return text / html fixedly. ** ** Also, let's improve it so that various values can be returned when it is decided to return a body other than HTML later.


Continue with Book!

Chapter: Evolving into a "decent web server"

Recommended Posts

Book updated: Evolving into a "decent web server"
Turn your Android Smart Phone into a Web Server using python.
Build a web server on your Chromebook
Start a simple Python web server with Docker
Launch a web server with Python and Flask
Create a web server in Go language (net/http) (2)
[Part 2] Let's build a web server on EC2 Linux
Learning neural networks using Chainer-Creating a Web API server
CTF beginner tried to build a problem server (web) [Problem]
Create a web server in Go language (net / http) (1)