[LINUX] Note about awk

Sometimes I use awk. However, since it's a little pace in a few months, every time I use it, I get "that? How do I use it?" I've done something to look up on the net, so I'll list the ones I often use.

You can use it just by knowing this

Details of each item are as follows

Super basic

Suppose you have the following files:

sample.txt


No data1 data2 data3
1  101   102   103
2  201   202   203
3  301   302   303
4  401   402   403

Prepare an awk script as below

sample.awk


{
  print $1 " " $3
}

Specify sample.txt after specifying the script with the -f option from the terminal.

$ awk -f sample.awk sample.txt

The output is as follows.

No data2
1  102  
2  202  
3  302  
4  402  

Commentary

print

output

Variables $ 1 to $ ...

Contains each element in a line.

sample.txt


No data1 data2 data3 <----- $1 = "No", $2 = "data1", $3 = "data2", $4 = "data3" 
1  101   102   103
2  201   202   203
3  301   302   303
4  401   402   403

Get the sum for each column

Suppose you have the following files:

sample.txt


No data1 data2 data3
1  101   102   103
2  201   202   203
3  301   302   303
4  401   402   403

Prepare an awk script as below

sample.awk


{
    if(NR > 1){
        sum1 += $2;
        sum2 += $3;
        sum3 += $4;
    }
}

END {
    print sum1 " " sum2 " " sum3;
}

Specify sample.txt after specifying the script with the -f option from the terminal.

$ awk -f sample.awk sample.txt

The output is as follows.

1004 1008 1012

Commentary

Variable NR

Current number of lines. In this example, the first line is the title, so it is ignored.

END

End processing. Here, the total value is output.

I want to get the total for each row.

Suppose you have the following files:

sample.txt


No data1 data2 data3
1  101   102   103    <----- 101 + 102 +I want to find 103.
2  201   202   203
3  301   302   303
4  401   402   403

Prepare an awk script as below

sample.awk


{
    sum = 0;
    for(i=2; i<=NF; i++) {
        sum += $i;
    }
    print sum;
}

Specify sample.txt after specifying the script with the -f option from the terminal.

$ awk -f sample.awk sample.txt

The output is as follows.

0
306
606
906
1206

Commentary

Variable NF

Contains the number of elements in each line.

File delimiters are commas instead of spaces

Suppose you have the following files:

sample.txt


No,data1,data2,data3
1,101,102,103
2,201,202,203
3,301,302,303
4,401,402,403

Prepare an awk script as below

sample.awk


BEGIN {
    FS = ",";
}

{
  print $1 " " $3
}

Specify sample.txt after specifying the script with the -f option from the terminal.

$ awk -f sample.awk sample.txt

The output is as follows.

No data2
1 102
2 202
3 302
4 402

Commentary

Variable FS

Separation position of each element. The default is blank.

START

You can write the start process.

File delimiters are commas, not spaces. Also, each element contains a comma.

sample.txt


No,data1,data2,data3
1,101,"102,101",103
2,201,202,203
3,301,"302,101",303
4,401,402,403

Prepare an awk script as below

sample.awk


BEGIN {
    FPAT = "([^,]+)|(\"[^\"]+\")"
}

{
    print $1 " " $3
}

Specify sample.txt after specifying the script with the -f option from the terminal.

$ awk -f sample.awk sample.txt

The output is as follows.

No data2
1 "102,101"
2 202
3 "302,101"
4 402

Commentary

Variable FPAT

Each element can be described by a regular expression.

Recommended Posts

Note about awk
Note about pointers (Go)
A note about mprotect (2)
Note
Note
A note about KornShell (ksh)
A note about TensorFlow Introduction
A note about [python] __debug__
A note about get_scorer in sklearn
A note about mock (Python mock library)
Django note 4
About LangID
About CAGR
About virtiofs
About python-apt
pyenv note
About Permission
About sklearn.preprocessing.Imputer
About gunicorn
About requirements.txt
About locale
About permissions
About axis = 0, axis = 1
About Opencv ③
Note: Python
About numpy
About pip
About numpy.newaxis
Python note
About endian
About Linux
About import
About Opencv ①
Django Note 1
direnv note
About Linux
Django note 3
Django note 2
About Linux
About Linux ①
About cv2.imread
About _ and __
About wxPython
[Note] RepresenterError
A note about doing the Pyramid tutorial
A note about the python version of python virtualenv
Data analysis in Python: A note about line_profiler
A note about the new style base class
A note about checking modifiers with Max Plus