[LINUX] Introduction to Socket API Learned in C Language Part 1 Server Edition

TCP / IP is used by web engineers regardless of their awareness. It is one of the indispensable communication protocols for the Internet.

In recent years, with the spread of IoT, etc., it has become indispensable knowledge in fields other than conventional Web technology.

Therefore, it has become the de facto standard for network APIs. I would like to study the network again based on the BSD socket interface.

There are many books and documents about how networks work, especially TCP / IP, but until I read the C language source code using the socket API, I couldn't get an idea of what the explanation was, and it didn't work very well. I couldn't understand, so I will basically follow the flow of the C language source code and learn the operation in parts based on the data and processing used.

Of course, I don't think it's perfect, so I'd appreciate it if you could point out any mistakes.

First, let's look at the mechanism of communication between different hosts via a TCP communication program using the familiar socket API that is connection-oriented.

The source code I wrote for the first time is a TCP server program.

This is a legacy example of IPv4 only, which is exhausted, but please understand that it is better to be simple to understand the mechanism. Eventually, we will see examples using the sockaddr_in6 structure that supports IPv6, and examples using APIs such as getaddrinfo that do not depend on the address family.

Postscript

An example using getaddrinfo was written in Introduction to Modern Socket API Learned in C.

Execution environment

Roughly speaking, it's CentOs 6.8 on x86_64. The compiler is gcc with no options specified. I think there are some differences depending on the CPU, OS and compiler, but the basics should work in the same way.

Source code

tcpd.c


#include <stdio.h> //printf(), fprintf(), perror()
#include <sys/socket.h> //socket(), bind(), accept(), listen()
#include <arpa/inet.h> // struct sockaddr_in, struct sockaddr, inet_ntoa()
#include <stdlib.h> //atoi(), exit(), EXIT_FAILURE, EXIT_SUCCESS
#include <string.h> //memset()
#include <unistd.h> //close()

#define QUEUELIMIT 5

int main(int argc, char* argv[]) {

	int servSock; //server socket descriptor
	int clitSock; //client socket descriptor
	struct sockaddr_in servSockAddr; //server internet socket address
	struct sockaddr_in clitSockAddr; //client internet socket address
	unsigned short servPort; //server port number
	unsigned int clitLen; // client internet socket address length

	if ( argc != 2) {
		fprintf(stderr, "argument count mismatch error.\n");
		exit(EXIT_FAILURE);
	}

	if ((servPort = (unsigned short) atoi(argv[1])) == 0) {
		fprintf(stderr, "invalid port number.\n");
		exit(EXIT_FAILURE);
	}

	if ((servSock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0 ){
		perror("socket() failed.");
		exit(EXIT_FAILURE);
	}

	memset(&servSockAddr, 0, sizeof(servSockAddr));
	servSockAddr.sin_family      = AF_INET;
	servSockAddr.sin_addr.s_addr = htonl(INADDR_ANY);
	servSockAddr.sin_port        = htons(servPort);

	if (bind(servSock, (struct sockaddr *) &servSockAddr, sizeof(servSockAddr) ) < 0 ) {
		perror("bind() failed.");
		exit(EXIT_FAILURE);
	}

	if (listen(servSock, QUEUELIMIT) < 0) {
		perror("listen() failed.");
		exit(EXIT_FAILURE);
	}

	while(1) {
		clitLen = sizeof(clitSockAddr);
		if ((clitSock = accept(servSock, (struct sockaddr *) &clitSockAddr, &clitLen)) < 0) {
			perror("accept() failed.");
			exit(EXIT_FAILURE);
		}

		printf("connected from %s.\n", inet_ntoa(clitSockAddr.sin_addr));
	  close(clitSock);
	}

	return EXIT_SUCCESS;
}

Lines 1-6

Loading the required headers. The comment on the right describes what is being read for use. There are some that can be further included and used from there, but I will omit them.

Lines 12-17

Allocate a memory area for the required data type. As usual, in C language, the size of the type other than the char type is undefined, so it may differ depending on the environment, but int is 4 bytes and short is 2 bytes. Bytes are generally 8 bits, but since it is not a strict definition, we will express it as an octet, which is a data representation in communication, if necessary.

Lines 19-22

Argument check. The port number associated with the socket can be specified by any number at the time of execution.

Lines 24-27

The port number, which is a character string representation, is converted to an integer representation with the atoi function. Atoi returns 0 if the integer conversion fails, so in that case it seems that it will be easier to use if it judges that the service name is specified by the character string and branches, but this time we will accept only numbers.

Lines 29-31

The socket API is finally here.

socket () is a system call that asks the OS to create a socket. Since the protocol family is specified in the first argument, specify PF_INET, which means to use the TCP / IP protocol family.

At present, AF_INET, which will be described later, has the same meaning, but PF_INET is used with respect to the design concept of the socket API. (Honestly, I don't know the image of the world where PF_INET was supposed to be needed, so it would be helpful if you could tell me if you understand what kind of protocol implementation is supposed to be.)

The second argument specifies the socket type. This time, we will use TCP, which is a stream-type protocol that guarantees highly reliable communication, so specify SOCK_STREAM.

For the third argument, specify IPPROTO_TCP, which means TCP, which is the protocol to be used. If 0 is specified, it will be automatically determined from the protocol family to be used and the type of socket, but it should be explicitly specified in case the number of protocols increases in the future.

In my environment, what can be specified for each is written in /usr/include/bits/socket.h and /usr/include/netinet/in.h.

Lines 34-37

Initializing the sockaddr_in structure on the server side.

The sockaddr_in structure is a data type used to associate addresses such as IP addresses and port numbers with TCP / IP sockets.

The sockaddr_in structure in my environment looks like this:

python


struct sockaddr_in
  {
    __SOCKADDR_COMMON (sin_);
    in_port_t sin_port;         /* Port number.  */
    struct in_addr sin_addr;        /* Internet address.  */

    /* Pad to size of `struct sockaddr'.  */
    unsigned char sin_zero[sizeof (struct sockaddr) -
               __SOCKADDR_COMMON_SIZE -
               sizeof (in_port_t) -
               sizeof (struct in_addr)];
};

python


__SOCKADDR_COMMON (sin_);

By the preprocessor in my environment

python


#define __SOCKADDR_COMMON(sa_prefix) \
  sa_family_t sa_prefix##family

Since it is replaced with, it becomes sa_family_t sin_family.

Since we do not know what value is in the reserved memory area, we will use memset to clear it to zero and then store the required value in each field.

For sin_family, specify AF_INET, which indicates that it is an Internet address family (IPv4).

For sin_addr, specify the in_addr structure.

The in_addr structure looks like this in my environment:

python


typedef uint32_t in_addr_t;
struct in_addr
{
    in_addr_t s_addr;
};  

Use INADDR_ANY for the s_addr field. This allows connections to all IP addresses to be accepted according to the port number, even if the server NIC has multiple IP addresses assigned to it.

Since x_86 is a little endian architecture, sending multibyte data to another host requires conversion to byte order to the network standard big endian. Use functions such as htonl and htons to convert to the byte order of the network.

Endianness is especially important knowledge for network programming, but for the time being, this time [Wikipedia link](https://ja.wikipedia.org/wiki/%E3%82%A8%E3%83%B3% E3% 83% 87% E3% 82% A3% E3% 82% A2% E3% 83% B3). I'll write more about it when I start using struct bitfields.

By the way, INADDR_ANY seems to mean 0x00000000 in my environment, so it seems that it is not affected by the conversion, but I am processing the htonl function to make the coding consistent. As far as this is seen, it is better to specify it for the time being. Looks good.

htonl stands for host to network long and converts a 4-byte integer in your environment to the network byte order. Similarly, the port number is processed by the htons function. htons is an abbreviation for host to network short, which converts a 2-byte integer in your environment to the network byte order.

Lines 39-42

The IP address and port number are linked to the socket created by the system call bind (). The socket cannot accept messages from remote hosts unless the protocol, IP address, and port number are associated.

Since the protocol to be used is associated when the socket is created, pass the sockaddr_in structure created earlier and its length to the bind function to associate the IP address and port number with the socket.

It should be noted here that the sockaddr_in structure is cast to a pointer to a structure called sockaddr.

The sockaddr structure looks like this in my environment:

python


struct sockaddr
  {
    __SOCKADDR_COMMON (sa_);    /* Common data: address family and length.  */
    char sa_data[14];       /* Address data.  */
  };  
};  

The part of __SOCKADDR_COMMON is replaced by the preprocessor, so it becomes sa_family_t sa_family.

The remaining bits are declared as an array of type char, so you might wonder what it is, The bottom line is that the sockaddr structure consists of address family fields and a storage area that can hold any 14 bits.

While sockaddr is a general-purpose data type for socket APIs, sockaddr_in is positioned as a data type specialized for TCP / IP.

As a result, each socket API only needs to accept the pointer of the sockaddr structure as an argument, and by examining the sa_family, the data field structure of the structure can be known, which makes it possible to branch the process appropriately. I will. It can be said that the implementation is conscious of versatility.

I personally think that this is one of the concrete examples of the procedure for exchanging data and the protocol that negotiates the content of the data.

In addition, when bind fails, a message to that effect is probably displayed from perror, but it is a good idea to check if the port is already associated with another socket.

If it is repeatedly running and stopping, it may be necessary to wait for a while considering the Time-Wait state of the socket after the TCP connection is disconnected. Take advantage of netstat.

Lines 44-47

The system call listen () is called, and the connection status from the client is accepted for the first time. All connection requests from clients that arrive earlier will be rejected. For example, if the executable program is a.out, on the command line,

./a.out 8080

If you try to execute, it will be in a state of accepting connections from clients by listen, so if you execute a command such as netstat -tln on another terminal

tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN

You can see that it comes out.

This means that this server program is ready to accept connection requests from clients.

If there is a connection request (SYN) from the client in this state, the server creates a new socket structure that stores the information from the client based on the packet.

Based on that, the TCP module on the server side performs a 3-way handshake. If the handshake is successful, the new structure will be in the ESTABLISHED state and will be queued in the list structure until it is called by accept.

While running the server program as a trial, specify the port number as 8080 in the client's Web browser, repeat the access, and execute a command such as netstat -tnc on the server side to check the progress of that state. I will.

As an aside, at first I was angry that I wouldn't access the suspicious port number from the browser if I tried it with an appropriate port number, so it seems that the vendor of each browser restricts the port number connection of the remote host. Of course, it is necessary to open the corresponding port on the server side.

Lines 49-58

To send and receive actual data to and from the client, accept takes an ESTABLISHED socket structure from the queue, assigns a descriptor to it, and returns it to the user process.

The socket descriptor has the same integer value as the file descriptor and matches the index of the array that contains the object's pointer. This array is a pointer array common to I / O objects used in the kernel, and is used to use the abstract interface for I / O. I wrote about this in Hacking Linux File Descriptors, so if you are interested, please see here as well.

Originally, I would like to use this to send and receive data to and from the client, but this time I am destroying it as soon as the socket becomes available.

By the way, the variables clitSockAddr and clitLen store the client information, so this time we are using it to display the client's IP address. inet_ntoa generates a string in dot decimal notation from the in_addr structure and returns the start address of its storage area.

Assuming that the domain of the server is hogehoge.com, the connection will be closed as soon as you access http://hogehoge.com:8080 with a browser, so Chrome will only display something like ERR_EMPTY_RESPONSE, but on the server The output will be something like connected from xxx.xxx.xxx.xxx..

What was interesting was that in Chrome and Safari, the above message was 3 lines in one access, but in Firefox there were 13 lines. It may be interesting to analyze what kind of exchange is done with one request with tcpdump.

This time, I used a Web browser as the TCP client software, but it is easier to verify various things if I made the client software myself, so 2nd I would like to make TCP client software.

Reference book

-Network construction by TCP / IP <Vol.1> Principle / Protocol / Architecture -[TCP / IP network experiment programming understood from the basics-Linux / FreeBSD compatible](https://www.amazon.co.jp/%E5%9F%BA%E7%A4%8E%E3%81%8B%E3 % 82% 89% E3% 82% 8F% E3% 81% 8B% E3% 82% 8BTCP-IP-% E3% 83% 8D% E3% 83% 83% E3% 83% 88% E3% 83% AF% E3% 83% BC% E3% 82% AF% E5% AE% 9F% E9% A8% 93% E3% 83% 97% E3% 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% 9F% E3% 83% B3% E3% 82% B0% E2% 80% 95Linux-FreeBSD% E5% AF% BE% E5% BF% 9C-% E6% 9D% 91% E5% B1% B1 / dp / 4274065847 / ref = sr_1_4? s = books & ie = UTF8 & qid = 1471270171 & sr = 1-4 & keywords =% E5% 9F% BA% E7% A4% 8E% E3% 81% 8B% E3% 82% 89% E3% 82% 8F% E3% 81% 8B% E3% 82% 8Btcp% 2Fip) -[TCP / IP Socket Programming C Language](https://www.amazon.co.jp/TCP-IP%E3%82%BD%E3%82%B1%E3%83%83%E3%83% 88% E3% 83% 97% E3% 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% 9F% E3% 83% B3% E3% 82% B0-C% E8% A8% 80% E8% AA% 9E% E7% B7% A8-Michael-Donahoo / dp / 4274065197) -[Detailed Linux Kernel 3rd Edition](https://www.amazon.co.jp/%E8%A9%B3%E8%A7%A3-Linux%E3%82%AB%E3%83%BC%E3 % 83% 8D% E3% 83% AB-% E7% AC% AC3% E7% 89% 88-Daniel-Bovet / dp / 4873111313X)

Recommended Posts

Introduction to Socket API Learned in C Language Part 1 Server Edition
Introduction to Socket API Learned in C Part 2 Client Edition
Introduction to Socket API Learned in C Part 3 TCP Server / Client # 1
Introduction to Socket API Learned in C Part 4 UDP Server / Client # 1
An introduction to the modern socket API to learn in C
Go language to see and remember Part 7 C language in GO language
Introduction to Protobuf-c (C language ⇔ Python)
Summary of Chapter 2 of Introduction to Design Patterns Learned in Java Language
Chapter 4 Summary of Introduction to Design Patterns Learned in Java Language
Summary of Chapter 3 of Introduction to Design Patterns Learned in Java Language
How to multi-process exclusive control in C language
Set up a UDP server in C language
Sample API server to receive JSON in Golang
Try to make a Python module in C language
Summary from the beginning to Chapter 1 of the introduction to design patterns learned in the Java language
Introduction to Python language
Introduction to PyQt4 Part 1
How to limit the API to be published in the C language shared library of Linux
C API in Python 3
I tried to illustrate the time and time in C language
Machine language embedding in C language
Heapsort made in C language
Introduction to Ansible Part ③'Inventory'
Introduction to Ansible Part ④'Variable'
Solving AOJ's Algorithm and Introduction to Data Structures in Python -Part1-
Solving AOJ's Algorithm and Introduction to Data Structures in Python -Part2-
Solving AOJ's Algorithm and Introduction to Data Structures in Python -Part4-
Hit the New Relic API in Python to get the server status
Solving AOJ's Algorithm and Introduction to Data Structures in Python -Part3-
Introduction to Linux Commands ~ LS-DYNA Edition ~
Multi-instance module test in C language
Introduction to Ansible Part 2'Basic Grammar'
Introduction to TensorFlow --Hello World Edition
Introduction to Python Hands On Part 1
Realize interface class in C language
Post to slack in Go language
How to wrap C in Python
Introduction to python-Environmental preparation (mac edition)
Optimization learned with OR-Tools Part0 [Introduction]
Introduction to Deep Learning ~ Dropout Edition ~
Segfault with 16 characters in C language
Introduction to Python Django (2) Mac Edition
Introduction to Ansible Part 1'Hello World !!'
From the introduction of GoogleCloudPlatform Natural Language API to how to use it
C language to see and remember Part 2 Call C language from Python (argument) string
[C language] I want to generate random numbers in the specified range
C language to see and remember Part 1 Call C language from Python (hello world)
How to make a request to bitFlyer Lightning's Private API in Go language
C language to see and remember Part 4 Call C language from Python (argument) double
C language to see and remember Part 5 Call C language from Python (argument) Array