[LINUX] Command collection (server edition) that you should know by isolating the cause of failure response

System commands

Show Linux uptime (uptime)

Display Linux uptime. When ping communication is temporarily NG, etc., the server may be suspected of having restarted and confirmed.

$ uptime
 12:16:05 up 48 days, 17:53,  2 users,  load average: 0.00, 0.02, 0.00

It is desirable that the load average is lower than the number of CPU cores.

Check process status (top command, ps command)

List running process states (snapshot)

Use the ps command to list the running process states (in a snapshot). It is used for confirmation such as "Which process is the heavy load?"

$ ps aux

It may be used in combination with the grep command to check the process status of a specific service. The following is an example of checking the process status of Nginx.

$ ps aux | grep -e nginx -e %CPU

List (real-time) running processes

Use the top command to list the (real-time) running process states. It is recommended to add "-c" to the option because the full path of the command can be displayed. Like the PS command, it is used to check "Which process is the heavy load?"

$ top -c

By checking the process ID with the ps command and specifying the "-p" option as shown below, you can check only the specific process status. The following is an example of checking the process status of Nginx.

$ ps aux | grep -e nginx -e %CPU
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      2142  0.0  0.0  53788    68 ?        Ss   Jun06   0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx     2143  0.0  0.1  85396  3048 ?        S    Jun06   0:03 nginx: worker process
root     13439  0.0  0.0  12108  1088 pts/0    S+   16:53   0:00 grep --color=auto -e nginx -e %CPU

//Execute by specifying the process ID of nginx
$ top -cp 2142
top - 16:54:05 up 36 days, 22:31,  1 user,  load average: 0.06, 0.23, 0.16
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   1829.2 total,     80.2 free,   1558.7 used,    190.4 buff/cache
MiB Swap:   2048.0 total,    854.2 free,   1193.8 used.    120.1 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 2142 root      20   0   53788     68     68 S   0.0   0.0   0:00.01 nginx

How to read the output result

1st line: Uptime and overall load information is output.

top - 16:45:36 up 36 days, 22:23,  1 user,  load average: 0.01, 0.02, 0.00

2nd to 3rd lines: The status related to CPU and task is output.

Tasks: 116 total,   1 running, 115 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.5 us,  0.0 sy,  0.0 ni, 87.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

Lines 4-5: The status related to memory and swap is output.

MiB Mem :   1829.2 total,     74.0 free,   1558.9 used,    196.3 buff/cache
MiB Swap:   2048.0 total,    854.2 free,   1193.8 used.    119.9 avail Mem

Check CPU, memory, and disk I / O statistics (vmstat)

Display virtual memory, CPU, and disk I / O statistics. By specifying m with the "--unit" option, the MB display will be displayed and it will be easier to see.

$ vmstat --unit m
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0    153    228    115    962    0    0   234    16    3    3  1  0 99  0  0

By specifying "1" as an argument, it can be displayed at 1-second intervals.

$ vmstat --unit m 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0    153    228    115    962    0    0   234    16    3    3  1  0 99  0  0
 0  0    153    227    115    962    0    0     0     0  150  168  0  0 100  0  0
 0  0    153    227    115    962    0    0     0     0  151  188  0  0 100  0  0

Check CPU and disk I / O statistics (iostat)

Displays CPU usage and I / O device usage.

$ iostat -h
Linux 4.18.0-147.8.1.el8_1.x86_64 (XXX-XX-XX-XXX)       07/01/2020      _x86_64_        (3 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.6%    0.0%    0.4%    0.0%    0.0%   98.9%

      tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn Device
    11.96       700.4k        48.5k       2.7T     194.7G vda
     0.00         0.0k         0.0k     416.0k       0.0k scd0

Check memory usage status (free command)

Display the memory usage status.

//Reference in kilobytes
$ free

//Refer to in megabytes
$ free -m

//Reference in gigabytes
$ free -g

Check disk usage status (df command)

Display the disk usage status. By specifying the "-h" option, it will be displayed with an appropriate unit, so it is recommended.

$ df -h

Check disk usage (in a specific directory) (du command)

Display the disk usage status (under a specific directory). By specifying the "-h" option, it will be displayed with an appropriate unit, so it is recommended.

$ du -h | sort -hr

You can also specify the depth to check by specifying the "--max-depth" option. The following is an example of checking the disk usage status in the current directory.

$ du --max-depth=1 -h | sort -hr
772M    .
743M    ./.vscode-server
13M     ./.local
11M     ./.cache
44K     ./.vscode-remote
8.0K    ./.vscode
8.0K    ./.ssh
8.0K    ./.pylint.d
8.0K    ./.config

Check the mount status of the volume (mount command)

Check the mounting status of the volume. It is used to check if the disk is mounted correctly, such as when "data cannot be read". Note that if you execute without the "-l" option, you may mount disks that should not be mounted, so be careful when executing. If you just want to check the situation, be sure to add the "-l" option.

$ mount -l

Also, there is "/ etc / fstab" in the file that is closely related to the mount command. This describes the mount settings of the volume to be reflected when the server is started or the system manager is reloaded. If you cannot check the mount status with the above "mount -l" command even though the mount settings are described in fstab, there is room for suspicion of an abnormality.

$ cat /etc/fstab

Check the message output by the Linux kernel (dmesg)

Check the information output by the Linux kernel (software that plays a central role in the OS such as process management, memory management, and schedule management). By specifying the "-T" option, the time information becomes easier to see. Also, by specifying the "-x" option, the information level of the message is displayed. When the server is restarted but the application cannot read the data normally, you may find that the disk is damaged by checking the output of dmesg.

$ dmesg -Tx

Check past resource usage trends (sar)

Check the usage tendency of past resources (CPU, memory, etc.). Please note that it may not be installed depending on the server.

//Check past CPU usage trends
$ sar -u

//Check past memory usage trends
$ sar -r

Network commands

Check network communication (ping)

Check network communication. By the way, the IP address "8.8.8.8" indicates Google's Public DNS. If the ping communication check is NG, it is a good idea to use the "traceroute" command introduced below to check where the communication was NG.

//Check communication by IP address
$ ping 8.8.8.8

//Check communication by host name * Name resolution is also possible
$ ping google.com

//Continue to check communication at 1 second intervals
$ ping -i 1 8.8.8.8

//Check communication with IPv4
$ ping -4 google.com

//Check communication with IPv6
$ ping -6 google.com

Trace network route

Check the network route. Please note that it may not be installed depending on the server. It is often used to confirm where communication was NG.

$ traceroute 8.8.8.8

Confirm name resolution (nslookup, dig)

Confirm name resolution. The difference between the nslookup and dig commands is that the former processes the response result of the name server so that it is easy to see, while the latter displays the response result of the name server as it is. If you have any doubts about DNS, it's a good idea to use this command first to check the health of name resolution.

//Check forward pull
$ nslookup google.com

//Check reverse lookup
$ nslookup 8.8.8.8
//Check forward pull
$ dig google.com

//Check reverse lookup
$ dig -x 8.8.8.8

Check the TCP socket connection status (ss, netstat)

Check the connection status of the TCP socket. In addition, netstat is deprecated after RHEL7, and ss is recommended as an alternative command. The following is a command to check the connection status of the IPv4 TCP socket. It is used to check what kind of server is operating from the port usage status. Although it is deprecated, netstat is easier to read, so it is better to use the netstat command rather than the ss command until it is completely unavailable.

//Recommendation
$ ss -at4

//not recommended
$ netstat -at4

Check network status (ip, ifconfig)

Check the network status. In addition, ifconfig is deprecated in RHEL7 or later, and ip is recommended as an alternative command. The following is a command to display the network settings for each existing interface. It is used to check the link-up status of the network interface.

//Recommendation
$ ip a

//not recommended
$ ifconfig -a

Checking the routing table (ip, route, netstat)

Check the routing table. In addition, route is deprecated in RHEL7 or later, and ip is recommended as an alternative command. The following is a command to check the routing table. Although deprecated, the output of the route command is easier to read, so it's better to use the route command than the ip command until it's completely unavailable.

//Recommendation
$ ip r

//not recommended
$ route

//not recommended
$ netstat -nr

Check ARP cache table

Check the ARP cache table. In addition, arp is deprecated in RHEL7 or later, and ip is recommended as an alternative command. The following is a command to check the ARP cache table. As an example, when the router is replaced, the MAC address of the previous device may be left in the APR cache, causing a communication failure, etc., and it is used as an isolation in that case.

//Recommendation
$ ip n

//not recommended
$ arp -a

Other

Check logs in real time

Check the output log in real time. It is effective to check what kind of log is output when trouble is occurring in the present progressive tense.

$ tail -f  access.log

If a huge log is output in a short time, it is often used in combination with the grep command. The following is an example of combining the tail command and grep command and outputting only the 404 error as standard output.

$ tail -f access.log | grep 404
125.197.158.68 - - [01/Jul/2020:13:45:11 +0900] "GET /404 HTTP/1.1" 301 162 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36" "-"
125.197.158.68 - - [01/Jul/2020:13:45:11 +0900] "GET /404 HTTP/2.0" 404 548 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36" "-"
125.197.158.68 - - [01/Jul/2020:13:45:11 +0900] "GET /favicon.ico HTTP/2.0" 404 548 "https://example.com/404" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36" "-"

Search for files that match (various) conditions

Search for files that match (various) criteria. Often you don't know the directory file structure and you don't know where the file you're looking for is. In such a case, this command is often used for searching.

//Search for the specified file under the search path
$find search path-name file name

//Execution example
$ find /var/log/ -name "access.log"
/var/log/nginx/access.log

//Example using wildcards
$ find /var/log/ -name "*.log"
/var/log/letsencrypt/letsencrypt.log
/var/log/dnf.rpm.log
/var/log/dnf.librepo.log
/var/log/boot.log
/var/log/nginx/error.log
/var/log/nginx/access.log
/var/log/audit/audit.log
/var/log/cloud-init-output.log
/var/log/cloud-init.log
/var/log/dnf.log
/var/log/sssd/sssd_nss.log
/var/log/sssd/sssd_implicit_files.log
/var/log/sssd/sssd.log
/var/log/hawkey.log
/var/log/tuned/tuned.log

//Example of specifying the file type (file only)
$ find /opt/docker/ -name "*.conf" -type f
/opt/docker/nginx/conf/default.conf
/opt/docker/nginx/conf/nginx.conf

//Example of specifying the file type (directory only)
$ find /opt/docker/ -name "nginx" -type d
/opt/docker/nginx

Extract lines containing specific characters from the file

Extract lines containing specific characters from the file. This command is used very often, but it is often used to check if an error is output from the log in troubleshooting. Here, I would like to introduce an example for personal use.

//Extract lines containing specific characters from the specified file
$ grep access.log nginx.conf
    access_log  /var/log/nginx/access.log  main;

//Use wildcards to extract lines containing specific characters from files
$ grep http *.conf
default.conf:    return       301 https://$host$request_uri;
default.conf:    listen       443 ssl http2;
default.conf:    listen       [::]:443 ssl http2;
default.conf:        proxy_pass                             http://localhost:5601;
nginx.conf:http {
nginx.conf:                      '$status $body_bytes_sent "$http_referer" '
nginx.conf:                      '"$http_user_agent" "$http_x_forwarded_for"';

// -v Option to exclude only specific strings from extraction
$ echo -e "test\ntast\ntest" > test.txt
$ cat test.txt
test
tast
test
$ grep test test.txt
test
test
$ grep -v test test.txt
tast

// -Extract a specific character string by OR search with the e option
$ grep -e test -e tast test.txt
test
tast
test

// -Extract case-insensitive with i option
$ grep Access.log nginx.conf
$ grep -i Access.log nginx.conf
    access_log  /var/log/nginx/access.log  main;

//Performs a search including subdirectories, and also targets the destination of symbolic links
$ grep -R http
nginx/conf/default.conf:    return       301 https://$host$request_uri;
nginx/conf/default.conf:    listen       443 ssl http2;
nginx/conf/default.conf:    listen       [::]:443 ssl http2;
nginx/conf/default.conf:        proxy_pass                             http://localhost:5601;
nginx/conf/nginx.conf:http {
nginx/conf/nginx.conf:                      '$status $body_bytes_sent "$http_referer" '
nginx/conf/nginx.conf:                      '"$http_user_agent" "$http_x_forwarded_for"';
nginx/html/index.html:<a href="http://nginx.org/">nginx.org</a>.<br/>
nginx/html/index.html:<a href="http://nginx.com/">nginx.com</a>.</p>

List files

List files. Here, I would like to introduce an example for personal use.

//Simply list files
$ ls
conf  docker-compose.yml  html

//Also display file information
$ ls -l
total 12
drwxr-xr-x 2 root root 4096 May 22 20:52 conf
-rw-r--r-- 1 root root  655 May 22 20:37 docker-compose.yml
drwxr-xr-x 2 root root 4096 May 20 16:02 html

//Dot file is also displayed
$ ls -la
total 20
drwxr-xr-x 4 root root 4096 May 22 20:37 .
drwxr-xr-x 5 root root 4096 May 19 13:47 ..
drwxr-xr-x 2 root root 4096 May 22 20:52 conf
-rw-r--r-- 1 root root  655 May 22 20:37 docker-compose.yml
drwxr-xr-x 2 root root 4096 May 20 16:02 html

//List files in order of newest modification date and display the order in reverse order
$ ls -latr
total 20
drwxr-xr-x 5 root root 4096 May 19 13:47 ..
drwxr-xr-x 2 root root 4096 May 20 16:02 html
-rw-r--r-- 1 root root  655 May 22 20:37 docker-compose.yml
drwxr-xr-x 4 root root 4096 May 22 20:37 .
drwxr-xr-x 2 root root 4096 May 22 20:52 conf

//Display file size in a human-readable format
$ ls -latrh
total 20K
drwxr-xr-x 5 root root 4.0K May 19 13:47 ..
drwxr-xr-x 2 root root 4.0K May 20 16:02 html
-rw-r--r-- 1 root root  655 May 22 20:37 docker-compose.yml
drwxr-xr-x 4 root root 4.0K May 22 20:37 .
drwxr-xr-x 2 root root 4.0K May 22 20:52 conf

//Display recursively
# ls -latrhR
.:
total 20K
drwxr-xr-x 5 root root 4.0K May 19 13:47 ..
drwxr-xr-x 2 root root 4.0K May 20 16:02 html
-rw-r--r-- 1 root root  655 May 22 20:37 docker-compose.yml
drwxr-xr-x 4 root root 4.0K May 22 20:37 .
drwxr-xr-x 2 root root 4.0K May 22 20:52 conf

./html:
total 16K
-rw-r--r-- 1 root root  612 Apr 14 23:19 index.html
-rw-r--r-- 1 root root  494 Apr 14 23:19 50x.html
drwxr-xr-x 2 root root 4.0K May 20 16:02 .
drwxr-xr-x 4 root root 4.0K May 22 20:37 ..

./conf:
total 16K
-rw-r--r-- 1 root root  670 May 20 19:32 nginx.conf
drwxr-xr-x 4 root root 4.0K May 22 20:37 ..
-rw-r--r-- 1 root root 1.3K May 22 20:52 default.conf
drwxr-xr-x 2 root root 4.0K May 22 20:52 .

Scroll text

Display text. It is used when checking the error part from the log.

$ more access.log
$ less access.log
$ view access.log

Differences between "more", "less" and "view" commands

The table below summarizes the differences you should know, although there are minor differences.

command Display speed How to exit
more fast Automatically ends when you reach the end of the line/ q / Ctrl+C
less fast q
view slow :q

Recommended Posts

Command collection (server edition) that you should know by isolating the cause of failure response
You should not use the --color = always option of the grep command
Explaining the mechanism of Linux that you do not know unexpectedly
Knowledge of linear algebra that you should know when doing AI
Regularly monitor the HTTP response of the web server
[Introduction to Python] Basic usage of the library scipy that you absolutely must know