[LINUX] Get the sum of each of multiple columns with awk

1. Check the field (item)

Check the field (item name) before awk. The field is usually on the first line of the (?) File, so use less or head to check it.

head -1 File name

I think that field names are often written with commas without line breaks or separated by pipes. Rather than counting visually to see "what number of fields the target column is" It is safe and efficient to change the delimiter to a newline using a Sakura editor. Copy the output field.

2. Align with Sakura Editor

In the Sakura editor, Ctrl + r (replace), check the regular expression in the dialog box, Replace the delimiter with "\ r \ n (line feed)". image.png image.png

You can tell which column is in the row number.

3. (Align in Excel)

For those who do not use Sakura Editor. Introducing how to replace delimiters in Excel. (* Regular expressions cannot be used in Windows Notepad, so they cannot be replaced with line breaks, etc.)

First, paste the character string into Excel. image.png Select Ctrl + h or Replace from Home tab> Edit group to display the dialog. image.png "Search string": (Enter the delimiter) "Character string after replacement": Press ** Ctrl + j ** (a symbol like a dot is displayed, but this means a line break) After the above, click Replace All image.png Since it is output in one cell, select all copy after entering edit mode by F2 or double click. Paste after exiting edit mode image.png In this state, go to Home tab> Edit group> Search and select> Select conditions and jump From the selection options dialog, select a blank cell and OK image.png Right click on the selected blank cell> Delete> Select entire row and OK image.png The data is arranged vertically.

4. awk command

** If you want to get the sum of \ $ 40 and \ $ 50 respectively **

Pipe the cated value to awk. ▼

cat [file name] | awk -F "," '{ print a += $40; b += $50 } END { print a, b }' |tail -1

[Explanation]

  1. Get all the data in the file with cat (don't forget to use zcat for gz files)
  2. Pipe it to the awk command and use the -F option to tell the command what is used as the file delimiter.
  3. The addition operator "+ =" adds the left and right operands and assigns them to the left side, so "a + = \ $ 40" means "a = a + \ $ 40". That is, assign "a + \ $ 40" to a, and repeat until the end of the column. Complete the process once with a semicolon so that b can do the same process. End the processing in {} with END.
  4. Print a and b after END to print the total value.
  5. If this is left as it is, all unnecessary calculations will be output, so pass it to tail -1 and output only the last line so that only the last line is output.

Recommended Posts

Get the sum of each of multiple columns with awk
Get the number of visits to each page with ReportingAPI + Cloud Functions
Compare the sum of each element in two lists with the specified value in Python
Type conversion of multiple columns of pandas DataFrame with astype at the same time
The story of sharing the pyenv environment with multiple users
Try to get the contents of Word with Golang
Get the operation status of JR West with Python
Adjust the ratio of multiple figures with the matplotlib gridspec
Sum of multiple numpy arrays (sum)
Get the number of digits
Get UNIXTIME at the beginning of today with a command
Get the number of occurrences for each element in the list
Get the index of each element of the confusion matrix in Python
Get the host name of the host PC with Docker on Linux
Get the source of the page to load infinitely with python.
Get the number of PVs of Qiita articles you posted with API
Get the number of views of Qiita
How to get the ID of Type2Tag NXP NTAG213 with nfcpy
Get the weather with Python requests
Get the weather with Python requests 2
Get the attributes of an object
Get the first element of queryset
Get the URL of a JIRA ticket created with the jira-python library
Extract specific multiple columns with pandas
Get the number of Youtube subscribers
PhytoMine-I tried to get the genetic information of plants with Python
Get the width of the div on the server side with Selenium + PhantomJS + Python
Get the number of searches with a regular expression. SeleniumBasic VBA Python
I tried to get the authentication code of Qiita API with Python.
Read the graph image with OpenCV and get the coordinates of the final point of the graph
Get the number of articles accessed and likes with Qiita API + Python
I tried to get the movie information of TMDb API with Python
Get and estimate the shape of the head using Dlib and OpenCV with python
Get the average salary of a job with specified conditions from indeed.com
Get the return value of an external shell script (ls) with python3
Align the size of the colorbar with matplotlib
Get the column list & data list of CASTable
Check the existence of the file with python
[Python] Get the variable name with str
Get the minutes of the Diet via API
Get started with the documentation tool Sphinx
The third night of the loop with for
Get out of multiple loops at once
Get the value of the middle layer of NN
Get holidays with the Google Calendar API
The second night of the loop with for
Get the last day of the specified month
[Python] Get the character code of the file
Get the filename of a directory (glob)
[PowerShell] Get the reading of the character string
Get lots of your tweets with Tweepy
Count the number of characters with echo
I wanted to know the number of lines in multiple files, so I tried to get it with a command
Get only the source code of the PyPI package with pip from the command line
Note: How to get the last day of the month with python (added the first day of the month)
Get the stock price of a Japanese company with Python and make a graph
How to get a list of files in the same directory with python
[Introduction to Python] How to get the index of data with a for statement
The story of doing deep learning with TPU
[Python] Get the files in a folder with Python
Note: Prepare the environment of CmdStanPy with docker