[LINUX] [Shell script] Calculates the elapsed time from the START and END log files and outputs it.

I want to extract the start and end times from the log file and put them together in a table

Suppose that the following log file is generated for each job that loads the table

TABLENAME1_load.log


# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:03:00 DATA LOAD NORMAL END !!!
# iroiro

TABLETAME2_load.log


# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:10:00 DATA LOAD NORMAL END !!!
# iroiro

I want to output this collectively for the following

result


JOBNAME      START     END       TIME(s)  TIME(m)
TABLENAME1   00:01:00  00:03:00  120      2
TABLENAME2   00:01:00  00:10:00  540      9
...

Method

Overview

I will do my best to complete everything with shell!

INPUT

--Date (yyyymmdd) --File with load job list (jobs.txt)

jobs.txt


TABLENAME1
TABLENAME2
TABLENAME3

--Log file for each load job (TABLENAME_load. .log)

shell:TABLENAME1_load.20200828000100.log


# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:10:00 DATA LOAD NORMAL END !!!
# iroiro

shell:TABLENAME2_load.20200828000100.log


# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:13:00 DATA LOAD NORMAL END !!!
# iroiro

shell:TABLENAME3_load.20200828000200.log


# iroiro
2020-08-28 00:01:00 DATA LOAD START !!!
2020-08-28 00:20:00 DATA LOAD NORMAL END !!!
# iroiro

OUTPUT --Table of JOBNAME, START, END, TIME (s), TIME (m) --Sort by TIME so that the job that is the bottleneck can be identified.

Process flow

P0.Temporary text file(JOBNAME, START, END, TIME(s), TIME(m)To the header)Create
P1. jobs.For each one-line load job in txt
    P1.1.Check if there is a log file
    P1.2.Extract the line with the START and END times from the log file
    P1.3.Extract the time part and start\Assign to a variable in the form of tEND
    P1.4. start,Store end in variable
    P1.5.elapsed time(Seconds)In a variable
    P1.6.elapsed time(Minutes)In a variable
    P1.7. 4,5,Store 6 in one line of text file
P1.8.When all load jobs are completed, output in tabular order in descending order of elapsed time
P1.9.Delete temporary text file

What to use

terminal


$ ls -l
sample.sh
jobs.txt
log/

$ cat jobs.txt
TABLENAME1
TABLENAME2
TABLENAME3

$ ls -l log/
20200828/
20200829/

$ ls -l log/20200828/
TABLENAME1_load.20200828000100.log
TABLENAME2_load.20200828000100.log
TABLENAME3_load.20200828000100.log
TABLENAME3_load.20200828000200.log

Whole code

sample.sh


#!/bin/sh

# Get the jobname from txt file
jobs=($(cat $2))

# Create tmp file with table header
echo JOBNAME START END 'TIME(s)' 'TIME(m)'  >> tmp_result.txt

for x in ${jobs[@]};
do
	if [ "$(ls log/$1/${x}_load.*)" != '' ]; then
		# Extract START and END from log file
		result=$(ls log/$1/${x}_load.* | tail -n 1 | xargs cat | grep "DATA LOAD" | cut -d' ' -f2 | echo $(tr '\n' '\t'))
		start=$(cut -d' ' -f 1 <<< $result)
		end=$(cut -d' ' -f 2 <<< $result)
		# Calc processing time
		time_s=$(expr `date -d$end +%s` - `date -d$start +%s`)
		time_m=$((time_s/60))
		# Save into txt file
		echo $x $start $end $time_s $time_m >> tmp_result.txt
	else
		echo 'there is no log file'
	fi
done 2> /dev/null

# Display result
(head -n +1 tmp_result.txt && tail -n +2 tmp_result.txt | sort -n -r -k 5) | column -t

# Remove tmp file
rm tmp_result.txt

result

――It seems that you can know the elapsed time for each job and understand which job should be improved.

terminal


$ sh sample.sh 20200828 jobs.txt
JOBNAME     START     END       TIME(s)  TIME(m)
TABLENAME3  00:01:00  00:20:00  1140     19
TABLENAME2  00:01:00  00:13:00  720      12
TABLENAME1  00:01:00  00:10:00  540      9

Processing details

Read jobs.txt and use it for for statement

sample.sh


#!/bin/sh
# get the jobname from txt file
jobs=($(cat $1))
for x in ${jobs[@]};
do
        echo $x
done

terminal


$ sh sample.sh jobs.txt
TABLENAME1
TABLENAME2
TABLENAME3
P0. Create a temporary text file (with JOBNAME, START, END, TIME (s), TIME (m) as header)

sample.sh


echo JOBNAME START END 'TIME(s)' 'TIME(m)'  >> tmp_result.txt
P1.1 Check if there is a log file

--Changed the first argument to yymmdd and the second argument to jobs.txt --Timestamp is included in the log file, so I want to check the existence with a wildcard. --If you judge with -e etc., an unexpected operator error will appear. --Use the result of ls to make a judgment

sample.sh


#!/bin/sh
# --
for x in ${jobs[@]};
do
	if [ "$(ls log/$1/${x}_load.*)" != '' ]; then
		echo 'log file: ' $x
	else
		echo 'there is no log file'
	fi
done

terminal


$ sh sample.sh 20200828 jobs.txt
log file:  TABLENAME1
log file:  TABLENAME2
log file:  TABLENAME3
P1.2. Extract the line with the START and END times from the log file

--list log files --If there are multiple log files, take the latest one - tail -n 1 --Cat the contents - xargs cat --Extract only "DATA LOAD" rows - grep "DATA LOAD" --Extract the second column separated by "" - cut -d' ' -f2

sample.sh


ls log/$1/${x}_load.* | tail -n 1 | xargs cat | grep "DATA LOAD" | cut -d' ' -f2 

terminal


$ sh sample.sh 20200828 jobs.txt
00:01:00 # TABLE1 START
00:10:00 # TABLE2 END
00:01:00 # TABLE2 START
00:13:00 # TABLE2 END
00:01:00 # TABLE3 START
00:20:00 # TABLE3 END

Now you can get HH: mm: dd of START and END for each job.

P1.3. Extract the time part and assign it to a variable in the format of START \ tEND

--Convert line breaks to tabs - tr '\n' '\t' --Substitute the result of pipe into a variable - result=$(process | process | process)

sample.sh


for x in ${jobs[@]};
do
	if [ "$(ls log/$1/${x}_load.*)" != '' ]; then
		result=$(ls log/$1/${x}_load.* | tail -n 1 | xargs cat | grep "DATA LOAD" | cut -d' ' -f2 | echo $(tr '\n' '\t'))
		echo $result
	else
		echo 'there is no log file'
	fi
done

terminal


$ sh sample.sh 20200828 jobs.txt
00:01:00 00:10:00 # TABLENAME1
00:01:00 00:13:00 # TABLENAME2
00:01:00 00:20:00 # TABLENAME3
P1.4. Store start and end in variables

--Substitute the first result separated by'' to start and the second to end - start=$(cut -d' ' -f 1 <<< $result) - end=$(cut -d' ' -f 2 <<< $result)

sample.sh


	if [ "$(ls log/$1/${x}_load.*)" != '' ]; then
		result=$(ls log/$1/${x}_load.* | tail -n 1 | xargs cat | grep "DATA LOAD" | cut -d' ' -f2 | echo $(tr '\n' '\t'))
		start=$(cut -d' ' -f 1 <<< $result)
		end=$(cut -d' ' -f 2 <<< $result)
	else
		echo 'there is no log file'
	fi
P1.5. Elapsed time (seconds) is stored in a variable, P1.6. Elapsed time (minutes) is stored in a variable

--Convert to UNIX time to calculate the difference between HH: mm: ss - date -dstart +%s - date -dend +%s --Evaluate the expression - expr

sample.sh


time_s=$(expr `date -d$end +%s` - `date -d$start +%s`)
time_m=$((time_s/60))

terminal


$ expr `date -d'00:01:01' +%s` - `date -d'00:00:01' +%s`
60
P1.7. Store 4,5,6 in one line of text file

sample.sh


echo $x $start $end $time_s $time_m >> tmp_result.txt
P1.8. When all load jobs are completed, output in tabular format in descending order of elapsed time.

――The first line is header, so it is not subject to sort. - head -n +1 tmp_result.txt && --Sort in descending order using the 5th column TIME (m) as the key from the 2nd row onward. - tail -n +2 tmp_result.txt | sort -n -r -k 5 --Display in table format - column -t

sample.sh


(head -n +1 tmp_result.txt && tail -n +2 tmp_result.txt | sort -n -r -k 5) | column -t
P1.9. Delete the temporary text file

sample.sh


rm tmp_result.txt 

in conclusion

I would like to get started with shell script, and if you have any opinions on how to write better, I would appreciate it if you could let me know.

reference

-I want to use wildcards to determine the existence of shell script files -About standard output / standard error output, / dev / null. -When comparing strings in the shell, I get the error: =: unary operator expected:

Recommended Posts

[Shell script] Calculates the elapsed time from the START and END log files and outputs it.
Specify the start and end positions of files to be included with qiitap
[Python] logging.logger Outputs the caller's log correctly from the wrapper
Give the history command a date and time and collect the history files of all users with a script