[PYTHON] Deeper about subprocess (3 series, updated version)

Introduction

Article written in 2017 is based on 2 systems, and I thought that it would be better to update the sloppy information, and as a result, I started editing. This article was newly created because it seemed that a large amount of additions such as subprocess.run () would be required.

As a goal, while explaining the subprocess function from before (support has not ended), we will describe it synonymously with subprocess.run () and subprocess.Popen (). Furthermore, we will take up a wider variety of descriptions using these.

If you are in a hurry, there is no problem if you read the common rules of cmd description and run () and later.

The story of what a module does in the first place is official or [this article](https://qiita.com/ Please refer to caprest / items / 0245a16825789b0263ad).

What is the conclusion? ** Let's leave everything to subprocess.run () as far as we can. If more complicated processing is required, use subprocess.Popen (), which is also the basis of subprocess.run (). ** As far as I can tell, no other functions introduced in this article are necessary (all can be replaced).

Module used


import io
import os
import time
import subprocess

Experimental file

hello.py


print('Hello world!')

Old shell command execution method

I don't have a chance to use it anymore, but I will summarize it because it will help when excavating old code.

os.system()

ʻOs.system (command) `(Official) Execute the command (string) in the subshell. This function is implemented using the standard C function system () and has the same restrictions as system (). Changes to sys.stdin etc. are not reflected in the environment of the command to be executed. (Omitted) The subprocess module provides more powerful functionality for running new processes and getting results. It is recommended to use the subprocess module instead of this function.

os.system()Example


#The result is output on the terminal
print(os.system('ls')) # 0

#On the terminal → sh: 1: tekito: not found
print(os.system('tekito')) #32512 (← environment dependent?Returns a number other than 0)

ʻOs.spawn functions (ʻos.spawnl (mode, path, ...), ʻos.spawnle (mode, path, ..., env), ʻos.spawnlp (mode, file,. ..) , ʻos.spawnlpe (mode, file, ..., env), ʻos.spawnv (mode, path, args) , ʻos.spawnve (mode, path, args, env)ʻOs.spawnvp (mode, file, args), ʻos.spawnvpe (mode, file, args, env)`) is ~~ I don't understand ~~ It seems to have a nuance such as "(process) is generated").

os.popen()

ʻOs.popen (cmd, mode ='r', buffering = -1) `(Official) Open a pipe I / O to or from the command cmd. The return value is an open file object connected to the pipe that can be read or written depending on whether mode is'r'(default) or'w'. (Omitted) It is implemented using subprocess.Popen. See the class documentation for a more powerful way to manage and communicate with subprocesses.

os.popen()Example


#Read the output result
print(os.popen('ls h*.py','r').readlines())
'''
['hello.py\n', 'hello2.py\n']
'''

Common rules for cmd description in subprocess

Give as a character string → shell = True ・ Risk of malfunction due to mixing of quotes Give as a list of strings → shell = False (default) ・ Low degree of freedom because wild cards cannot be used

(Official) If shell is True, the specified command will be executed by the shell. You're mainly using Python for enhanced control flows (more than most system shells), as well as shellpipes, filename wildcards, environment variable expansion, and expansion to user home directories. This may be useful if you want easy access to other shell features. (Official) If you explicitly call a shell with shell = True, it is your application's responsibility to ensure proper quotes for all whitespace and metacharacters to address the shell injection vulnerability.

shell=Commands that can only be realized with True


#How to give in a string list (the following 5 are not possible with this notation)
print(subprocess.call(['ls','-l'], shell=False)) # 0

#Shell pipeline
print(subprocess.call('echo -e "a\nb\nc" | wc -l', shell=True)) # 0
#semicolon
print(subprocess.call('echo Failed.; exit 1', shell=True)) # 1
#Wildcard
print(subprocess.call('ls -l *.py', shell=True)) # 0
#Environment variable
print(subprocess.call('echo $HOME', shell=True)) # 0
#Tilde symbol for HOME
print(subprocess.call('ls -l ~', shell=True)) # 0

Functions before integration by subprocess.run ()

There are three types shown below. They are roughly classified into the three elements in the table below. In both cases, ON / OFF can be switched in subprocess.run (), so it is no longer necessary.

Function name Exit status Output result CalledProcessError
(.run()Arguments and attributes in .returncode .stdout check
subprocess.call()
subprocess.check_call()
subprocess.check_output()

subprocess.call()

subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False, cwd=None, timeout=None) Command execution. The return value is the end status (0 means normal end).

Main options ① (standard I / O, timeout)

** (common to other subprocess functions) **

When specifying a file for standard input / output, put it in ʻopen (). It's close to a digression, but in Python3 series, there is almost no problem if you select bytes type (mode ='rb' /'wb') rather than str type (mode ='r' /'w') (mode ='rb' /'wb'`). As you can see by looking it up, both are supported, or only the bytes type is supported in many cases).

Key option usage examples


print(subprocess.call(['cat', 'hello.py'])) # 0
#Specify standard input / standard output (the second is hello2.py is created)
print(subprocess.call(['cat'], stdin=open('hello.py','rb'))) # 0
print(subprocess.call(['cat', 'hello.py'], stdout=open('hello2.py','wb'))) # 0

#Specify timeout
print(subprocess.call(['sleep', '3'], timeout=1)) # TimeoutExpired

#(Supplement) Even with the cwd option specified in the execution directory~Is not supported.
print(subprocess.call(['ls','-l'], shell=False, cwd = "~")) # FileNotFoundError

subprocess.check_call()

subprocess.check_call(args, *, stdin=None, stdout=None, stderr=None, shell=False, cwd=None, timeout=None) Command execution. The return value is the end status (0 means normal end). Returns CalledProcessError at abnormal termination.

check_call()Example of use


#Successful completion
print(subprocess.call(['cat', 'hello.py'])) # 0
print(subprocess.check_call(['cat', 'hello.py'])) # 0

#Abnormal termination: Returns an exception error instead of an error status
print(subprocess.call(['cat', 'undefined.py'])) # 1
print(subprocess.check_call(['cat', 'undefined.py'])) # CalledProcessError

subprocess.check_output()

subprocess.check_output(args, *, stdin=None, stderr=None, shell=False, cwd=None, encoding=None, errors=None, universal_newlines=None, timeout=None, text=None) Command execution. The return value is the standard output.

(Official) By default, this function returns the data as encoded bytes.

Therefore, if it is a character string, etc., it is decoded and then processed.

check_output()Example of use


o = subprocess.check_output('ls h*.py', shell=True)
print(o) # b'hello.py\nhello2.py\n'
print(o.decode().strip().split('\n')) # ['hello.py', 'hello2.py']
Main option (2) (standard error output, special value)

** (common to other subprocess functions) **

The output destination can be changed by giving to stderr.

--subprocess.DEVNULL: Specify the standard input / output destination as os.devnull (bit bucket, black hole) --subprocess.PIPE: Pipe specification to standard I / O destination --subprocess.STDOUT: Specify that standard error output is output to the same handle as standard output (only for 2> 1 & .stderr)

Standard error output related operations


#Discard all output
o = subprocess.call(['cat', 'undefined.py'], stderr=subprocess.DEVNULL, stdout=subprocess.DEVNULL)

#Take out standard output
try:
    o = subprocess.check_output(['cat', 'hello.py'])
    print(o) # b"print('Hello world!')"
except subprocess.CalledProcessError as e:
    print('ERROR:', e.output)

#Extract standard error output
try:
    o = subprocess.check_output(['cat', 'undefined.py'], stderr=subprocess.PIPE)
    print(o)
except subprocess.CalledProcessError as e:
    print('ERROR:', e.stderr) # ERROR: b'cat: undefined.py: No such file or directory\n'

#Get standard errors integrated into standard output
try:
    o = subprocess.check_output(['cat', 'undefined.py'], stderr=subprocess.STDOUT)
    print(o)
except subprocess.CalledProcessError as e:
    # e.e instead of stderr.Note that it is stdout. In addition, e.Output is also possible.
    print('ERROR:', e.stdout) # ERROR: b'cat: undefined.py: No such file or directory\n'

Replaced by subprocess.run ()

(Official) The recommended way to start a subprocess is to use the run () function, which can handle all usages. For more advanced usage, you can also use the underlying Popen interface directly.

So, replace the above code with run (). See also the above table for specified options.

subprocess.run()

subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, capture_output=False, shell=False, cwd=None, timeout=None, check=False, encoding=None, errors=None, text=None, env=None, universal_newlines=None)

Subprocess.call () replacement

Add .return code. In addition to the stdin used when introducingsubprocess.call (), there is an option called ʻinput. This is a standard input specification using subprocess.Popen (). Communicate ()` described later, and specifies a character string.

run()Replacement: Key option usage example


print(subprocess.run('ls -l h*.py', shell=True).returncode) # 0
#Specify standard input / standard output (hello2).py is created)
print(subprocess.run(['cat'], stdin=open('hello.py','rb'), stdout=open('hello2.py','wb')).returncode) # 0
# (Use input. Almost synonymous with above)
print(subprocess.run(['cat'], input=open('hello.py','rb').read(), stdout=open('hello2.py','wb')).returncode) # 0

#Specify timeout
print(subprocess.run(['sleep', '3'], timeout=1).returncode) # TimeoutExpired
Subprocess.check_call () replacement

Add check = True. There is also a later method called .check_returncode ().

run()Replace: check_call()Example of use


#Success status
print(subprocess.run(['cat', 'hello.py']).returncode) # 0
print(subprocess.run(['cat', 'hello.py'], check=True).returncode) # 0

#Returns an exception error instead of an error status
print(subprocess.run(['cat', 'undefined.py']).returncode) # 1
print(subprocess.run(['cat', 'undefined.py'], check=True).returncode) # CalledProcessError

#(Supplement) check_returncode()Later error output using
p = subprocess.run(['cat', 'undefined.py'], check=False)
print(p.returncode) # 1
print(p.check_returncode()) # CalledProcessError
Subprocess.check_output () replacement

Use the stdout option and .stdout.

run()Replace: check_output()Example of use


o = subprocess.run('ls h*.py', shell=True, stdout=subprocess.PIPE, check=True).stdout
print(o) # b'hello.py\nhello2.py\n'
print(o.decode().strip().split('\n')) # ['hello.py', 'hello2.py']

With the new optional function, the standard error output related operations have more degrees of freedom when using run () than when using check_output ().

Standard error output related operations


# check=False (default)So, you can look into strerr without stopping with an error
o = subprocess.run(['cat', 'undefined.py'], check=False, capture_output=True)
print((o.stdout, o.stderr)) # (b'', b'cat: undefined.py: No such file or directory\n')

#Integrate standard error output into standard output.
o = subprocess.run(['cat', 'undefined.py'], check=False, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)
print(o.stdout) # b'cat: undefined.py: No such file or directory\n'

# capture_output=If True, PIPE is specified for both standard output and standard error output.
try:
    o = subprocess.run(['cat', 'undefined.py'], check=True, capture_output=True)
    print(o.stdout)
except subprocess.CalledProcessError as e:
    print('ERROR:',e.stderr) # ERROR: b'cat: undefined.py: No such file or directory\n'

Highly flexible processing by subprocess.Popen ()

Official Within this module, the underlying process creation and management is handled by the Popen class. The Popen class offers a lot of flexibility so that developers can handle less common cases that are not covered by simple functions.

So, an introduction to flexibility. In particular, we will focus on things that cannot be reproduced with run (). As a big feature, subprocess.Popen () only creates a process ** and does not wait for the end **. Use .poll () to check if it is finished, and to proceed after confirming the end. Execute .wait ().

Replacement example (call ())

** Many arguments and attributes of Popen () are common to run (). ** ** To reproduce subprocess.call (), use .return code which was also used insubprocess.run (). When reproducing other codes, you can do the same as when using run ().

Popen()Replace: call()Example of use


#Specify standard input / standard output (the second is hello2.py is created)
p = subprocess.Popen('ls -l h*.py', shell=True)
p.wait()
print(p.returncode) # 0

#Specify standard input / standard output (hello2).py is created)
p = subprocess.Popen(['cat'], stdin=open('hello.py','rb'), stdout=open('hello2.py','wb'))
p.wait()
print(p.returncode) # 0

#Specify timeout
p = subprocess.Popen(['sleep', '3'])
p.wait(timeout=1) # TimeoutExpired

Deadlock and Popen.communicate ()

Popen.communicate () is a method that returns a tuple of the form (standard output, standard error output) (giving a string input if necessary).

(Official) Communicate with process: Send data to standard input. Read data from stdout and stderr until the end of the file is reached. Wait for the process to finish. (Omitted) communicate () returns tuples (stdout_data, stderr_data). (Official) Warning Using .stdin.write, .stdout.read, .stderr.read can fill the OS pipe buffer of another pipe and cause a deadlock. Use communicate () to avoid this.

[This article](https://qiita.com/caprest/items/0245a16825789b0263ad#%E3%83%91%E3%82%A4%E3%83%97%E3%81%AB%E3%81%A4% E3% 81% 84% E3% 81% A6) also gives a detailed explanation. If there is a risk that I / O data will accumulate in the buffer too much, use communicate () that manages it all at once. There is also a person who actually responded to deadlock using communicate () (article).

The disadvantage is that ** input / output cannot be exchanged multiple times within one process **. It cannot be used together with communicate () for real-time output management and interactive processing introduced below.

Another difference is that the option timeout required for timeout processing does not exist in Popen (). Strout, etc., but exists in communicate (). There is a solution to this, as introduced in this article.

Reproduction of shell pipeline

It is also introduced in Official. You may give it as a character string as shell = True, but you can reproduce it with a grammar that avoids it as shown below. For more information on SIGPIPE along the way, see here or here. / 39190250 / under-what-condition-does-a-python-subprocess-get-a-sigpipe).

Popen()Example: shell pipeline


p1 = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE) #Pipe to output destination
p2 = subprocess.Popen(['grep', 'python'], stdin=p1.stdout, stdout=subprocess.PIPE) #Receive p1 for input
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
print(p2.communicate()[0].decode().strip().split('\n')) #get stdout

Status confirmation and real-time output management by .poll ()

I mentioned that Popen () does not wait for the end, but you can check whether it is in the end state by .poll (). If it is not finished, it returns None, and if it is finished, its status is returned. You can get the output line by line with .stdout.readline (). This can be obtained ** every time the output is broken in real time **. On the other hand, if you use a method such as .stdout.read () that gets all the output at once, it will automatically enter the end waiting state **.

Popen()Example: Real-time output management


cmd = 'echo Start;sleep 0.5;echo 1;sleep 0.5;echo 2;sleep 0.5;echo 3;sleep 0.5;echo Finished'
p = subprocess.Popen(cmd, shell=True, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)
#Get in real time
while p.poll() is None:
    print('status:',p.poll(), p.stdout.readline().decode().strip())
print('status:',p.poll())
print()

p = subprocess.Popen(cmd, shell=True, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)
#No output until 2 seconds have passed
print('status:',p.poll(), p.stdout.read().decode().strip())
'''
status: None Start
status: None 1
status: None 2
status: None 3
status: None Finished
status: 0

status: None Start
1
2
3
Finished
'''

A more proper real-time output management code template is introduced in this article.

Interactive input / output

The exchange of sending the input in several times and obtaining the output each time is realized as follows. I referred to this article. Since it cannot be reproduced by communicate (), which finishes input / output of data at once, there is a risk of deadlock when large data exchange is required.

First, as a preparation, create a python code that requires multiple inputs and generates an output each time.

(Preparation) calc.py


a = int(input())
print('n    =', a)
b = int(input())
print(' + {} ='.format(b), a+b)
c = int(input())
print(' - {} ='.format(c), a-c)
d = int(input())
print(' * {} ='.format(d), a*d)
e = int(input())
print(' / {} ='.format(e), a/e)

Do this as follows. Note that if you do not flush, it will remain in the buffer and will not be input.

Popen()Example: Interactive I / O


numbers = range(1, 6)

#Non-interactive (communicate)()Send all at once using
p = subprocess.Popen(['python', 'calc.py'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
#Concatenate with line feed code
str_nums = '\n'.join(map(str, numbers))
o, e = p.communicate(input=str_nums.encode())
print(o.decode())

'''
n    = 1
 + 2 = 3
 - 3 = -2
 * 4 = 4
 / 5 = 0.2
'''

#Interactive (.stdin.write()Send line by line using
p = subprocess.Popen(['python', 'calc.py'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for n in numbers:
    #Send numbers
    p.stdin.write(str(n).encode())
    #Send line break (input()Required because it is an input to
    p.stdin.write('\n'.encode())
    #Release the buffer (important)
    p.stdin.flush()
    #Read one line of the result (use a while statement if it spans multiple lines)
    print(p.stdout.readline().decode().strip())

'''
n    = 1
 + 2 = 3
 - 3 = -2
 * 4 = 4
 / 5 = 0.2
'''

Parallel execution (lack of information)

Since Popen () does not wait for the end, it is possible to run multiple processes at the same time. Asynchronous processing is managed by the ʻasyncio module, but some of them correspond to subprocess` (Official. asyncio-subprocess.html#asyncio-subprocess)). In addition, there are various asynchronous processes such as this article and this article. There are many articles. I think that it is necessary to grasp "** what has the lowest delay and the least risk **" when selecting among various methods, but since I do not have that much knowledge at present, this is here. The above discussion is omitted. The following is a simplified version for the time being.

Parallel execution


#permutation
print('start')
s = time.time()
running_procs = [
    subprocess.run(['sleep', '3']),
    subprocess.run(['sleep', '2']),
    subprocess.run(['sleep', '1']),
    ]
# run()If so, it takes about 6 seconds in total to wait for execution in order
print('finish [run()]', time.time() - s)
# ----------------------------
#Parallel
print('start')
s = time.time()
procs = [
    subprocess.Popen(['sleep', '3']),
    subprocess.Popen(['sleep', '2']),
    subprocess.Popen(['sleep', '1']),
    ]
# Popen()Then, if you just start, it will end immediately
print('finish [run Popen()]', time.time() - s)

#Waiting for each process to finish
[p.wait() for p in procs]
#Processing is completed in about 3 seconds, which is equivalent to the longest process
print('finish [wait Popen()]', time.time() - s)

At the end

It can be seen that most of the various processes can be executed immediately by using run (), and that the asynchronous and interactive application with much more freedom is possible by using Popen ().

Recommended Posts

Deeper about subprocess (3 series, updated version)
A note about subprocess