[LINUX] An introduction to the modern socket API to learn in C

When I started with Introduction to Socket API in C Language, Part 1 Server Edition, I started with a legacy API limited to IPv4 and gradually developed it. I was planning to do this, but in the future I would like to talk about socket programming using a modern API no matter what I do.

Therefore, this time, we will approach the details by rewriting the echo server program that returns the character string received from the client using a modern API. The advantage of modern APIs in socket programming over legacy examples is not in terms of processing speed, but in the flexibility to handle a variety of cases, both now and in the future.

In the legacy example, it was only possible to create a socket that can only support IPv4, but in the example using the modern API, it is flexible to prepare both IPv4 and IPv6 sockets and branch the processing according to the received data packet. You can implement sexual features.

Furthermore, it is possible to create a program that does not depend on a specific address family **, rather than being limited to IPv4 or IPv6. The modern API is implemented with that in mind.

Especially for those who want to become an iOS engineer, Apple has already ** banned code that depends on IPv4 **. Because of the fact that IPv4 is still popular in the world, it is a way of thinking and skills that need to be mastered.

I tried to hit a big mouth, but the explanation becomes complicated, so this time I will limit it to IPv4, but it is possible to make a program that supports IPv6 packets with a slight change. It will write something that focuses on that process at a later date.

python


#include <sys/socket.h> //socket(), bind(), accept(), listen()
#include <stdlib.h> // exit(), EXIT_FAILURE, EXIT_SUCCESS
#include <netdb.h> // getaddrinfo, getnameinfo, gai_strerror, NI_MAXHOST NI_MAXSERV
#include <string.h> //memset()
#include <unistd.h> //close()
#include <stdio.h> // IO Library

#define MAX_BUF_SIZE 1024

int get_socket(const char*);
void do_service(int);
static inline void do_concrete_service(int);
void echo_back(int);
static inline void usage(int);

int main (int argc, char *argv[]) {

    int sock;

    if (argc != 2) {
        usage(EXIT_FAILURE);
    }   

    if ((sock = get_socket(argv[1])) == -1){
        fprintf(stderr, "get_socket() failure.\n");
        exit(EXIT_FAILURE);
    }

    do_service(sock);

    close(sock);

    return EXIT_SUCCESS;
}

The main function looks like the one above. The specification is such that the port number or service name is specified as an argument, but if it is not specified or if an extra one is used as an argument, the following function called usage will be executed to terminate it. ..

python


static inline void usage (int status){
    fputs ("\
argument count mismatch error.\n\
please input a service name or port number.\n\
", stderr);
    exit (status);
}

If the argument is valid, create a socket with the function defined this time called get_socket and get the socket descriptor to refer to the control information.

A socket descriptor is exactly the same as a file descriptor. I wrote about it in detail in Hacking Linux file descriptors, so if you are interested, please refer to it.

After getting the socket descriptor, what kind of service is actually provided as a server is summarized in the do_service function defined this time.

Finally, I closed the socket, but in fact do_service is in an infinite loop, so this program does not reach here.

When daemonizing such a program, implement it so that close can be called at an appropriate timing by adjusting the behavior when receiving a specific signal at the end. We will also discuss how to implement the daemon process separately.

Now that we know the flow, let's look at the processing of the get_socket function. This function is responsible for creating the socket and returning the socket descriptor.

python


int get_socket (const char *port) {

    struct addrinfo hints, *res;
    int ecode, sock;
    char hbuf[NI_MAXHOST], sbuf[NI_MAXSERV];

    memset(&hints, 0, sizeof(hints));

    hints.ai_family   = AF_INET;
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags    = AI_PASSIVE;

    if ((ecode = getaddrinfo(NULL, port, &hints, &res) != 0)) {
        fprintf(stderr, "failed getaddrinfo() %s\n", gai_strerror(ecode));
        goto failure_1;
    }

    if ((ecode = getnameinfo(res->ai_addr, res->ai_addrlen, hbuf, sizeof(hbuf), sbuf, sizeof(sbuf), NI_NUMERICHOST | NI_NUMERICSERV)) != 0){
        fprintf(stderr, "failed getnameinfo() %s\n", gai_strerror(ecode));
        goto failure_2;

    }

    fprintf(stdout, "port is %s\n", sbuf);
    fprintf(stdout, "host is %s\n", hbuf);

    if ((sock = socket(res->ai_family, res->ai_socktype, res->ai_protocol)) < 0) {
        perror("socket() failed.");
        goto failure_2;
    }

    if (bind(sock, res->ai_addr, res->ai_addrlen) < 0) {
        perror("bind() failed.");
        goto failure_3;
    }

    if (listen(sock, SOMAXCONN) < 0) {
        perror("listen() failed.");
        goto failure_3;
    }

    return sock;

failure_3:
    close(sock);
failure_2:
    freeaddrinfo(res);
failure_1:
    return -1;
}

It's a bit different from the legacy IPv4-only example. In the legacy example, we had to define a sockaddr_in address structure and specify the appropriate values.

The modern method is to pass an addrinfo structure that stores only the information that is the key to the getaddrinfo function, and have all the missing information stored based on the hint.

I think that you can feel the atmosphere from the legacy example, but the various data specified by the socket API are related, and when one data is specified, the other required data is automatically determined. Because it is, it is an algorithm that takes advantage of its characteristics.

The addrinfo structure is a structure that stores the information required for the socket API to refer to the address information, and in the legacy example, it also includes the sockaddr structure that was used directly.

python


/* Structure to contain information about address of a service provider.  */
struct addrinfo
{
  int ai_flags;         /* Input flags.  */
  int ai_family;        /* Protocol family for socket.  */
  int ai_socktype;      /* Socket type.  */
  int ai_protocol;      /* Protocol for socket.  */
  socklen_t ai_addrlen;     /* Length of socket address.  */
  struct sockaddr *ai_addr; /* Socket address for socket.  */
  char *ai_canonname;       /* Canonical name for service location.  */
  struct addrinfo *ai_next; /* Pointer to next in list.  */
};

A member of the pointer to addrinfo called ai_next will play an important role in the future. If a member of a structure contains a pointer to the structure, it is usually a linked list data structure. As is the case with this case, the savvy person may have come to the conclusion that this fact is related to address family-independent programming.

Now, since the information of the members of this addrinfo structure can be passed as an argument to the socket API such as bind and socket, simple and flexible implementation becomes possible.

In the getaddrinfo function, specify a pointer to the host name or IP address string in the first argument. This time, specify NULL, but the reason will be described later. When creating a client program, you may specify a pointer to a string object that refers to a domain name such as "tajima-taso.jp".

In the second argument, specify a pointer to the port number or service name string. For Linux, the service name is referenced via the NIS server or from the / etc / services file on the local host. In my environment, port number 8080 is open, so you can specify ./a.out 8080, but when you check / etc / services, 8080 is mapped to web cache as the service name. Therefore, even if you execute ./a.out web cache, the same specification will be obtained.

If AI_PASSIVE is specified in the 3rd argument, the NULL specification in the 1st argument corresponds to INADDR_ANY (IN6ADDR_ANY_INIT in the case of IPv6) that can receive connections from all NICs as in the legacy example. AI_PASSIVE will be described later.

Also, although it may be limited to glibc, it is implemented so that even if you pass the character string "\ *", it will be judged as NULL, so even if you specify "\ *", it will work in the same way. Although implemented so in the source code, it may not be versatile as it was not found at the documentation level.

The AF_INET and SOCK_STREAM specifications are the same as in the legacy example. It means IPv4 address system and stream type protocol = TCP, respectively.

I think that the specification of AI_PASSIVE is often used in combination with the first argument NULL. What is this? Specify when using the addrinfo structure to be completed in bind.

Since bind is a system call used when associating a socket with explicit address information, AI_PASSIVE acts as a flag for requesting the getaddrinfo function to complete the addrinfo structure for the server. .. A socket that listens for such a connection is called a passive socket.

Pass a pointer to the pointer of the addrinfo structure as the fourth argument. From here you can dereference and reference the pointer to the completed addrinfo structure. Since the library for allocating dynamic memory is used internally to allocate the memory area of the addrinfo structure, a memory leak will occur unless the area is released by the freeaddrinfo function whenever it is no longer needed. I will end up.

If there is an error, if you pass the integer that will be the return value to the gai_strerror function, a pointer to the character string that stores the corresponding error content will be returned, so that content is output to the standard error output.

Then we call the getnameinfo function, which is not necessary for the socket communication process itself. This function is used to refer to the host name (IP address) and service name (port number) from the address structure.

The first argument is a pointer to the socket address structure, the second argument is the size of the object, the third argument is a pointer to the buffer where the host name should be stored, the fourth argument is its size, and the fifth argument is the service. Specify a pointer to the buffer where the name is stored, its size in the 6th argument, and various flags in the 7th argument.

The size of hbuf is NI_MAXHOST, and the size of sbuf is NI_MAXSERV, and the area is secured. It is defined in .h.

In my environment

netdb.h


#  define NI_MAXHOST      1025
#  define NI_MAXSERV      32

It was.

The 7th argument sets the flags of NI_NUMERICHOST and NI_NUMERICSERV, but they are the flags to store the host name in numerical format and the service name (port number) in numerical format in the buffer, respectively.

If you run it as ./a.out 8080,

port is 8080
host is 0.0.0.0

Is displayed.

Then make a socket system call to create the socket. In the legacy example, the three values passed as arguments specified constant literals, but in the modern example, the required values are stored in the members of the acquired addrinfo structure, so use them. To create. This allows you to write programs that are address family independent.

By the way, as a socket process, create a socket structure that internally specifies protocol information, store the pointer to the created file object in the member that stores the pointer to the file object, and finally fd_install the file object. It will be the flow of doing.

As explained in the legacy example, bind is a system call for associating a socket with address information. The socket cannot accept messages from the remote host unless the address information is linked in addition to the protocol, so this is a necessary process. There should be no problem with the IP address, but if there is a port number that is already in use, the socket cannot be linked, so be sure to specify a free port number or service name.

The example of calling listen is the same as the legacy example, except that it uses an address structure. As a review, the TCP socket program is designed to create a socket for each client connection in order to realize communication that establishes a connection. This time, the maximum queue size is specified using a macro constant called SOMAXCONN defined in bits / socket.h.

SOMAXCONN means the number of queues that the kernel defines as the default value for listen. 128 in my environment.

Isn't it possible that some infrastructure engineers will come to see this value?

In the case of a server running an application such as a database with many connection requests, this number may overflow the queue and drop the connection, so I think that there are many cases where this value is dynamically increased. ..

After the above processing, the socket descriptor that is waiting for connection is returned to the main function. Then pass the socket descriptor to the do_service function to execute the actual service.

python


void do_service (int sock) {

    char hbuf[NI_MAXHOST], sbuf[NI_MAXSERV];
    struct sockaddr_storage from_sock_addr;
    int acc_sock;
    socklen_t addr_len;

    for (;;) {
        addr_len = sizeof(from_sock_addr);
        if ((acc_sock = accept(sock, (struct sockaddr *) &from_sock_addr, &addr_len)) == -1) {
            perror("accept() failed.");
            continue;
        } else {
            getnameinfo((struct sockaddr *) &from_sock_addr, addr_len, hbuf, sizeof(hbuf), sbuf, sizeof(sbuf), NI_NUMERICHOST | NI_NUMERICSERV);

            fprintf(stderr, "port is %s\n", sbuf);
            fprintf(stderr, "host is %s\n", hbuf);

            do_concrete_service(acc_sock);

            close(acc_sock);
        }
    }
}

In the do_service function, a new socket is created by executing the accept system call to perform the actual communication processing with the client, and the socket descriptor is passed to the do_concrete_service function that is responsible for the specific service processing.

Accept is also as explained in the legacy example. The socket descriptor is associated with the member that stores the file object of the socket structure that succeeded in the 3-way handshake (fd_install function), and the socket descriptor is returned.

In this case, the client address information is stored in the from_sock_addr variable, so getnameinfo is executed based on that to display the client address information.

In the legacy example, the sockaddr_in structure was used to store the information from the client, but this structure was only used to store the IPv4 address information. Therefore, we are switching to a method of defining sockaddr_storage that has enough storage space to store address information and casting it as needed so that it can handle any address family. ..

static inline void do_concrete_service (int sock) {
    echo_back(sock);
}

In the do_concrete_service function, the function focusing on service processing is executed. In this case, execute a function called echo_back that sends back the message from the client as it is. This process is separated so that the contents can be changed easily.

By the way, in my environment gcc, when I optimized up to -O2, it expanded inline.

python


void echo_back (int sock) {

    char buf[MAX_BUF_SIZE];
    ssize_t len;

    for (;;) {
        if ((len = recv(sock, buf, sizeof(buf), 0)) == -1) {
            perror("recv() failed.");
            break;
        } else if (len == 0) {
            fprintf(stderr, "connection closed by remote host.\n");
            break;
        }

        if (send(sock, buf, (size_t) len, 0) != len) {
            perror("send() failed.");
            break;
        }
    }
}

As for the send / receive process, it is originally a highly abstract implementation, so at this stage it is almost the same as the legacy example.

This time, the actual processing content itself is almost the same as the legacy example, but we were able to create a basic form for programming that does not depend on the protocol or address family.

After that, when I present an example of socket API, I will create it using this API.

** "I understand how to use the getaddrinfo function! I just want to know how to use IPv4 or IPv6!" **

For the method, RFC6555 has the method adopted by Google Chrome and Mozilla Firefox, so if you are interested, please refer to the following.

Happy Eyeballs: Success with Dual-Stack Hosts

Regarding the getaddrinfo function, Stack Overflow Vulnerability was discovered a while ago (currently). Has been fixed), so it is important to be careful about updating libraries such as glibc and software that you use, not just getaddrinfo.

Referenced source code

Linux kernel: 2.6.11 glibc:glibc-2.12.1 CPU:x86_64

OS CentOS release 6.8

compiler

gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)

Recommended Posts

An introduction to the modern socket API to learn in C
Introduction to Socket API Learned in C Part 2 Client Edition
Introduction to Socket API Learned in C Part 3 TCP Server / Client # 1
Introduction to Socket API Learned in C Language Part 1 Server Edition
Introduction to Socket API Learned in C Part 4 UDP Server / Client # 1
An easy way to hit the Amazon Product API in Python
An introduction to Python for C programmers
How to use the C library in Python
To automatically send an email with an attachment using the Gmail API in Python
Create a filter to get an Access Token in the Graph API (Flask)
C API in Python 3
An introduction to machine learning
An introduction to object orientation-let's change the internal state of an object
How to manipulate the DOM in an iframe with Selenium
I tried to illustrate the time and time in C language
An introduction to functional programming to improve debugging efficiency in 1 minute
An introduction to Python Programming
An introduction to Bayesian optimization
An introduction to Web API development for those who have completed the Progate Go course
How to get rid of the "Tags must be an array of hashes." Error in the qiita api
How to intentionally issue an error in the shell During testing
How to know the internal structure of an object in Python
Create an alias for Route53 to CloudFront with the AWS API
[Python] PCA scratch in the example of "Introduction to multivariate analysis"
[Introduction to Python] I compared the naming conventions of C # and Python.
[Introduction to Python] How to use the in operator in a for statement?
Hit the New Relic API in Python to get the server status
An introduction to Pandas that you can learn while suffering [Part 1]
The easiest way to get started in Slack socket mode (Go)
Regularly upload files to Google Drive using the Google Drive API in Python
Programming to fight in the world ~ 5-5,5-6
Programming to fight in the world 5-3
Hit the Sesami API in Python
Programming to fight in the world-Chapter 4
In the python command python points to python3.8
Introduction to Protobuf-c (C language ⇔ Python)
Hit the web API in Python
How to wrap C in Python
Cython to try in the shortest
Access the Twitter API in Python
An introduction to Python for non-engineers
An alternative to `pause` in Python
Programming to fight in the world ~ 5-2
[Python Tutorial] An Easy Introduction to Python
From the introduction of GoogleCloudPlatform Natural Language API to how to use it
[C language] I want to generate random numbers in the specified range
Sample code to get the Twitter API oauth_token and oauth_token_secret in Python 2.7
I want to leave an arbitrary command in the command history of Shell
[Introduction to Python] Thorough explanation of the character string type used in Python!
An example of the answer to the reference question of the study session. In python.
I want to create an API that returns a model with a recursive relationship in the Django REST Framework
I tried to explain how to get the article content with MediaWiki API in an easy-to-understand manner with examples (Python 3)
Learn the design pattern "Prototype" in Python
Learn the design pattern "Builder" in Python
[Introduction to Python] How to use class in Python?
Try using the Wunderlist API in Python
The story of an error in PyOCR
Learn the design pattern "Flyweight" in Python
Try using the Kraken API in Python
Learn the design pattern "Observer" in Python
Learn the design pattern "Memento" in Python