Chapter: Evolving into a "decent web server" The has been updated.
If you want to read more, please like the book or follow the author ;-)
The following is an excerpt of the contents of the book.
In this chapter, already made by everyone "Henachoko Web We will evolve the "server" into a "decent web server".
So, let's sort out what exactly the Web server you created was a mess.
In the first place, a Web server was a server that communicates according to ** HTTP rules **. Conversely, in order to be a good web server, it must be a server that can return responses according to ** HTTP rules **.
However, your Henachoko web server does not adhere to the rules of HTTP.
Specifically, let's look again at the response (= server_send.txt
) returned by your server.
server_send.txt
HTTP/1.1 200 OK
Date: Wed, 28 Oct 2020 07:57:45 GMT
Server: Apache/2.4.41 (Unix)
Content-Location: index.html.en
Vary: negotiate
TCN: choice
Last-Modified: Thu, 29 Aug 2019 05:05:59 GMT
ETag: "2d-5913a76187bc0"
Accept-Ranges: bytes
Content-Length: 45
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html
<html><body><h1>It works!</h1></body></html>
Your web server ** will return this content as a fixed response no matter what path the request comes in. ** **
Looking at it again, it seems that the HTTP format is being protected in the first place. There is a response line on the first line, some headers from the second line, and a body after the blank line. I borrowed the Apache response from the beginning, so this is natural.
It's not cool that the body is "It works!" No matter what path you request, but it's not a rule violation. Let's insist that "we are such a service."
** The problem is in the header. ** **
For example, according to RFC7231 7.1.1.2, the Date
header describes the date and time when the web server sent the response. It is supposed to be done.
However, it is currently fixed regardless of the response generation date and time.
(In the above example, Date
is fixed to2020/10/28 7:57:45
)
Also, the Server
header (RFC7231 7.4.2) contains information about the program that generated the response. It is supposed to be done, and generally the name of the web server etc. is described.
However, we didn't create a web server named Apache
, but it's fixed atServer: Apache / 2.4.41 (Unix)
.
(It's not strictly a rule violation, as the web server name is free to be given by the server developer, but it's not a "decent" web server to deceive someone else's program name. .)
So, in order to evolve into a "decent web server", let's improve the program so that we can arrange these headers and generate a response that properly follows the HTTP rules.
** As a first step, look at the headers that Apache is returning one by one to see if they need to be reworked. ** **
By the way, the only HTTP headers that are required are the Host
headers in the request, and there are no required headers in the response.
Therefore, this step removes things that are difficult to implement and learn.
** The next step is to actually modify the program and improve it so that the appropriate headers can be included in the response. ** **
Let's take a look at the headers included in the Apache response, select the ones you need, and check the details of the modifications. The content will be a little more, so if you are not interested in the details, you can skip it.
Date
Date: Wed, 28 Oct 2020 07:57:45 GMT
RFC: https://triple-underscore.github.io/RFC7231-ja.html#header.date
As we've already seen, let's first look at the Date
header.
Date
represents the date and time when the response was generated.
Right now, a fixed date and time is returned, but in Python it is easy to get the date and time using the datetime
module, so let's return the ** date and time when the response was generated ** properly.
Server
Server: Apache/2.4.41 (Unix)
RFC: https://triple-underscore.github.io/RFC7231-ja.html#header.server
As we've already seen, the Server
header returns information about the program that generated the response.
The content is not specified, but it is said that you should not write too detailed information, and it is common to keep it to the server name or OS name.
Right now it's fixed at Apache / 2.4.41 (Unix)
, but let's give it your own original name.
In this document, we will return ver.0.1 of Henachoko Web Server, ** HenaServer / 0.1 ** for short.
Please give your own name.
Content-Location
Content-Location: index.html.en
RFC: https://triple-underscore.github.io/RFC7231-ja.html#header.content-location
Content-Location
indicates an alternative URL to get the returned response.
This may be a little difficult to understand.
In this case, Chrome goes through a process called content negotiation when requesting a resource called /
.
Specifically, Chrome is in an HTTP request
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: ja-JP,ja;q=0.9,en-US;q=0.8,en;q=0.7
Through headers such as "This is the format of the content that I can understand, this is the supported compression format, and I want the language most to be Japanese, but English is fine." I am telling you that.
The server side is generating an appropriate response according to its contents.
This collaborative work is called content negotiation, which means that the content can change depending on how you request it.
for that reason,
"If you want the same content as the response returned this time, go to /index.html.en
instead of /
. "
That's what Apache is telling us.
The server we'll create in this book won't be that complicated, so we'll ** not return this header **.
Vary
Vary: negotiate
RFC: https://triple-underscore.github.io/RFC7231-ja.html#header.vary
The Vary
header is a header that controls whether the browser or intermediate server uses the cache.
As long as the headers listed in this header do not change, it means that you can use the cache.
In this case, the server is telling you that you can reuse the cached content as long as there is no change in the content negotiation header described earlier.
Since cache control is not performed in this document, this header will be ** not returned **.
TCN
TCN: choice
RFC: https://tools.ietf.org/html/rfc2295#section-8.5
This is a slightly minor header, found in another RFC 2295 about Transparent Content Negotiation.
It is a header that tells how content negotiation was done, but it will be complicated, so I will omit the explanation here.
Content negotiation is not done in this document, so we will ** not return this header **.
Last-Modified
Last-Modified: Thu, 29 Aug 2019 05:05:59 GMT
RFC: https://triple-underscore.github.io/RFC7232-ja.html#header.last-modified
The Last-Modified
header returns the date and time when the content was last modified.
It is said that this header should be returned if a consistent last modified date and time can be returned.
In addition, the situation that "consistent last modification date and time cannot be returned" means that the content is the same even though the Last-Modified is the same, or the content is the same even though the Last-Modified is different for URLs that have different contents each time. It refers to the case where there is no meaningful Last-Modified value.
In this document, the last modification date and time may or may not be meaningful for each URL, so for the sake of simplicity, this header will be ** not returned **.
ETag
ETag: "2d-5913a76187bc0"
RFC: https://triple-underscore.github.io/RFC7232-ja.html#header.etag
The ETag
header is an identifier that indicates a particular version of the resource that will generate the response.
That is, if the resource is updated in any way, the ETag is expected to have a different value.
Often, hash values for files and content are used.
This is also used for cache control of browsers and intermediate servers, but since cache control is not dealt with in this document, this header will be ** not returned **.
Accept-Ranges
Accept-Ranges: bytes
RFC: https://triple-underscore.github.io/RFC7233-ja.html#header.accept-ranges
The Accept-Ranges
header is a header that indicates that it corresponds to a" partial request for a resource "called Range Requests
.
Range Requests
is a function that enables split download when downloading a large file.
This header does not correspond to Range Requests
in this document, so this header will be ** not returned **.
Content-Length
Content-Length: 45
RFC: https://triple-underscore.github.io/RFC7230-ja.html#header.content-length
The Content-Length
header returns a decimal value that indicates the number of bytes in the response body.
This means that the server should be returned, with some exceptions.
It's not difficult to get the number of bytes in python, so I'll make sure to ** return the number of bytes in the body **.
Keep-Alive
Keep-Alive: timeout=5, max=100
RFC: https://tools.ietf.org/id/draft-thomson-hybi-http-timeout-01.html#keep-alive
The Keep-Alive
header returns information about how long the connection can be reused for the connection reuse described below.
This header has been submitted as a draft RFC (experimental specification) and is supported by almost all modern browsers, but has not yet been incorporated as a formal RFC standard specification.
This header is ** not returned ** as this document does not implement connection reuse.
Connection
Connection: Keep-Alive
The Connection
header returns whether the TCP connection once established can be reused in the next request.
It takes some time to establish a TCP connection, and it is known that reuse of a TCP connection is effective when optimizing the display speed.
The connection reuse function is outside the scope of this document and will not be implemented.
However, in HTTP / 1.1
, communication is supposed to reuse the connection by default, and ** Servers that do not support reuse of the connection must return Connection: Close
. Since it is, I will return this in this book. ** **
Content-Type
Content-Type: text/html
RFC: https://triple-underscore.github.io/RFC7231-ja.html#header.content-type
The Content-Type
header returns the format of the response body.
The values that can be used are the values called MIME-Type
.
And so on.
If you want to check the list, this site will be helpful.
If you omit this header, it will be treated as an "unidentified file" and may not be displayed on the browser screen, so return the one that matches the contents properly.
In this book, only HTML
will be returned as the body in the next step, so ** First, we will return text / html
fixedly. ** **
Also, let's improve it so that various values can be returned when it is decided to return a body other than HTML
later.
Chapter: Evolving into a "decent web server"
Recommended Posts