[PYTHON] One-liner processing of quagmire memory (CSV)

I had a record of what I did for the past month, but a memo when I tabulated it. As shown below, the data is recorded in CSV format, and these are aggregated for each column. (In addition, even if you have a clue by looking at the numbers, please go through it brilliantly.)

4000,2000,5000,7000,20
4000,2000,5000,7000,20
4000,2000,5000,7000,20
4000,3500,6010,6000,20
4000,3500,6010,6000,20
4000,3500,6010,6000,20
4000,3500,6010,6000,20
4000,3500,6010,6000,20
4000,3500,6010,6000,20
4000,3500,6010,6000,20
4000,3500,6010,6000,20
4000,2000,5210,5200,20
4000,2000,5210,5200,20
4000,2000,5210,5200,20
4000,2000,5210,5200,20
4000,2000,5210,5200,20
6000,5000,7000,2000,20
4000,6000,6000,2000,20
1500,1500,2000,1000,20
1500,1500,2000,1000,1
1500,1500,2000,1000,1
1500,1500,2000,1000,1
1500,1500,2000,1000,1
1500,1500,2000,1000,1
1500,1500,2000,1000,1
1500,1500,2000,1000,1

The one liner to be processed is as follows.

One liner


python -c "import csv;print([sum([int(m) for m in materials]) for materials in zip(*csv.reader(open('big.txt')))])"

Well-understood (maybe) commentary

First, read CSV. Give the open file object to the reader object of the csv module. Since the reader object is iterable, it can be turned with for.

>>> import csv
>>> for r in  csv.reader(open('big.txt')):
...     print(r)
...
['4000', '2000', '5000', '7000', '20']
['4000', '2000', '5000', '7000', '20']
['4000', '2000', '5000', '7000', '20']
['4000', '3500', '6010', '6000', '20']
…

Next, use the zip function to aggregate the values for each column. The zip object returned by the zip function is also iterable, so you can use for in the same way. You can see that the values for each column are aggregated in tuple.

>>> z = zip(*csv.reader(open('big.txt')))
>>> for entry in z:
...     print(entry)
...
('4000', '4000', '4000', '4000', '4000', '4000', '4000', '4000', '4000', '4000', '4000', '4000', '4000', '4000', '4000', '4000', '6000', '4000', '1500', '1500', '1500', '1500', '1500', '1500', '1500', '1500')
('2000', '2000', '2000', '3500', '3500', '3500', '3500', '3500', '3500', '3500', '3500', '2000', '2000', '2000', '2000', '2000', '5000', '6000', '1500', '1500', '1500', '1500', '1500', '1500', '1500', '1500')
('5000', '5000', '5000', '6010', '6010', '6010', '6010', '6010', '6010', '6010', '6010', '5210', '5210', '5210', '5210', '5210', '7000', '6000', '2000', '2000', '2000', '2000', '2000', '2000', '2000', '2000')
('7000', '7000', '7000', '6000', '6000', '6000', '6000', '6000', '6000', '6000', '6000', '5200', '5200', '5200', '5200', '5200', '2000', '2000', '1000', '1000', '1000', '1000', '1000', '1000', '1000', '1000')
('20', '20', '20', '20', '20', '20', '20', '20', '20', '20', '20', '20', '20', '20', '20', '20', '20', '20', '20', '1', '1', '1', '1', '1', '1', '1')

Finally, use the sum function to sum the values in the tuple. However, since the value read by csv.reader is a character string, it is necessary to convert it to int and then give it to the sum function.

>>> z = zip(*csv.reader(open('big.txt')))
>>> for entry in z:
...     print(sum([int(e) for e in entry]))
...
86000
67000
118130
107000
387

Recommended Posts

One-liner processing of quagmire memory (CSV)
Full-width and half-width processing of CSV data in Python
Various processing of Python
Basic processing of librosa
Parallel processing with Parallel of scikit-learn
Post processing of python (NG)
One-liner basic graph of HoloViews