[LINUX] Convert only date serial numbers in CSV files with awk

Introduction

When I hit a certain Web API, the date information in the CSV data that was spit out was a serial number (why?), So I made a script to convert a specific column of CSV file with awk. As the execution environment, I used Kali with WSL of Windows 10 Pro.

Shell script I made

test.sh


#~ Omitted

cat ${output_dir}/tmp.csv | awk '{FS=",";OFS=","} NR>1{ $2=strftime("%Y-%m-%d", 
substr($2, 0, 10)) } 1' > ${output_dir}/result.csv

#Omitted ~

result

It seems that it worked for the time being.

tmp.csv


No,Date,description,
1,1273645330000,TEST,
~

result.csv


No,Date,description,
1,2010-05-12,TEST,
~

awk notes

WSL


$ echo 1 2 | awk '{print $1}'
1
$ echo 1 2 | awk '{print $1} 1'
1
1 2
$ echo 1 2 | awk '{print $1} 0'
1
$ echo 1 2 | awk '{print $0}'
1 2
$ echo 1 2 | awk '{print}'
1 2
$ echo 1 2 | awk '1'
1 2
$ echo 1 2 | awk '{print $1} true'
1
$ echo 1 2 | awk '{print $1} false'
1

awk built-in function strftime

A function that converts a time stamp into a character string that is easy for humans to understand and returns it. This function seems to be the same as ANSI's C standard library.

strftime function


strftime([format [, timestamp]])
Conversion specification character Contents
%Y 4-digit year including the century part.
%m Month(Decimal notation).. Specify from 01 to 12.
%d Monthly total date(Decimal notation).. Specify from 01 to 31.

awk built-in function substr

A function that returns a substring of length length starting from the startth character in string.

substr(string, start [, length])

Other

In this script, I redirected the original CSV data to a new CSV file, Writing to overwrite the original CSV file as shown below did not work. Since the original CSV file is locked with awk, can't it be overwritten?

test.sh


#1
awk '{FS=",";OFS=","} NR>1{ $2=strftime("%Y-%m-%d", 
substr($2, 0, 10)) }1' ${output_dir}/tmp.csv > ${output_dir}/tmp.csv

#2
$(awk '{FS=",";OFS=","} NR>1{ $2=strftime("%Y-%m-%d", 
substr($2, 0, 10)) }1' ${output_dir}/tmp.csv) > ${output_dir}/tmp.csv

** $ () ** is called "command substitution (posix spec)" and seems to call the subshell. .. You can do the same thing by enclosing it in backticks (``), but it seems that you can nest it.

at the end

In the end, the following questions remained, but the goal of returning the serial number in the CSV was achieved. I was able to implement the processing I wanted to do, but it feels like I won the game and lost the game lol If you know the following, I would appreciate it if you could teach me m (_ _) m

  1. The problem that CSV file cannot be overwritten by the combination of awk and redirect
  2. True / false doesn't work between} and single quotes in the END block
  3. Is the Web API returning date information as a serial number to avoid bugs depending on the partner environment?

reference

Recommended Posts

Convert only date serial numbers in CSV files with awk
Convert UTF-8 CSV files to read in Excel
Batch convert all xlsx files in the folder to CSV files
Convert files written in python etc. to pdf with syntax highlighting
Transpose CSV files in Python Part 1
Handle csv files with Django (django-import-export)
Handle Excel CSV files with Python
Read files in parallel with Python
Create multiple users with serial numbers at once in Ansible Playbook: Part 2
Load csv with duplicate columns in pandas
How to convert csv to tsv in CLI
Convert from PDF to CSV with pdfplumber
Read all csv files in the folder
Rename YYYYMMDD format files to serial numbers
Read and write csv files with numpy
Convert PDF files to PNG files with GIMP
How to read CSV files in Pandas