[PYTHON] Put stdout into numpy.loadtxt

Summary

How to put the output of stdout directly into numpy.loadtxt in case you want to analyze the data processed by awk with numpy of python.

Thing you want to do

For example, suppose you have name, height, and weight data.

input.dat


Yamada 160 50
Tanaka 170 60
Sakana 180 70

Here, suppose that you want to extract only the numerical part and correlate height and weight. You can do that with Python, but it's awkward when the file gets big, so use awk and read the numerical data with numpy.loadtxt. In other words, it looks like this.

$ cat input.dat | awk '{print $2, $3}' > tmp.dat
$ python analysis.py tmp.dat

analysis.py


import sys
import numpy as np

data = np.loadtxt(sys.argv[1])

#After this, I analyzed it messed up

However, it is troublesome to get an intermediate file. I want to make it look like this.

$ python analysis.py input.dat

things to do

First, use subprocess to use shell commands within Python. Put the final output in subprocess.PIPE and put it in numpy.loadtxt.

analysis.py


import sys
import subprocess
import numpy as np

p1 = subprocess.Popen(["cat", sys.argv[1]], stdout=subprocess.PIPE)
p2 = subprocess.Popen(["awk", "{print $2, $3}"], stdin=p1.stdout, stdout=subprocess.PIPE)

data = np.loadtxt( p2.stdout )

#After this, I analyzed it messed up

others

In the above script, I wrote the shell command in Python, but due to the fact that the content of the awk script is different every time.

$ cat input.dat | awk '{print $2, $3}' | python analysis.py

If you want to connect with a pipe like this, use fileinput.

analysis.py


import numpy as np
import fileinput

data = np.loadtxt(fileinput.input())

#After this, I analyzed it messed up

Recommended Posts

Put stdout into numpy.loadtxt
Put protocol buffers into sqlite with python