Experienced people who mainly use PHP talk about the 4th day of starting Python (subprocess)

Introduction

I thought I'd write the 4th day as it is after the 3rd day of starting Python, but it seems to be quite long, so I decided to prepare a new 4th day story and move it. This time, the main story is to use PHP's exec in Python's subprocess. I think it will be helpful for applications where Python is used as part of Linux functions. It's close to batch processing, but it also has the benefit of running it in Python.

Refusal of the reason why the story flies

Those who are studying (researching) programming languages well are writing systematically organized contents. I think everyone is wonderful. In my case, I am in the position of using a computer or a programming language as a tool, so it is different from the general flow because I research and shape the necessary contents each time. If you start "something like", it will be necessary, so please pick up and use only the reference part.

1. 1. Examine the file format (identify Word Excel PowerPoint PDF without extension)

Since it is necessary to check the format of the file dropped from the WEB, in PHP, the following was executed to check the return value and add the file extension. Many programs still determine what the file is by looking at the file extension, so you need to do it here. It's likely to be unnecessary in the future, but it's still needed now.

file.php


exec('file -i -b data/default.xlsx', $out, $ret);
echo $out[0];
//The return value is as follows
// application/vnd.ms-excel; charset=binary

file.py



import subprocess
args = ['file','-i','-b','default.xlsx']
proc = subprocess.run(args,stdout = subprocess.PIPE, stderr = subprocess.PIPE)
string = proc.stdout.decode("utf8")
none = 'no'
if string.find('excel') < 0:
	print(none)
elif string.find('excel') > 0:
	print(string)
#The return value is as follows
# application/vnd.ms-excel; charset=binary

Both PHP and Python have the same return value as a matter of course.

2. Convert PDF to text file

I was trying to do this with subprocess as well, but I found a way to use the pdftotext module by embedding it in Linux, and I was wondering what happened. You can find the method at the following link. Convert PDF to text with pdftotext Is this an article, isn't the code complicated? If you think about it, it even prepares a display screen, so I will simply use subprocess as a reference later. Let's still compare it with the PHP code.

pdftotext.php



$command ="pdftotext -layout -nopgbrk data/*.pdf";
shell_exec($command);

pdftotext.py



import subprocess
args = ['pdftotext','-layout','-nopgbrk','data/sjd23d_mn.pdf'] # (Does not convert unless a file name is specified)
args = ['pdftotext','-layout','-nopgbrk',"data/*.pdf"] #(Can be specified in asterics with double quotes)
proc = subprocess.run(args)

It didn't work right away, but I changed it a little and it worked, but at first I couldn't convert by picking up the file name with asterics. I wrote this in the code, but if you enclose the file name specification with double quotes, you can specify it with asterics. It was useless because I wrote it in the form of specifying the parameters of subprocess, but this difference is important. (This is for me)

3. 3. Convert PDF to image

pdftoppm.php



$command ="pdftoppm -jpeg data/*.pdf data/";
shell_exec($command);

pdftoppm.py



import subprocess
args = ['pdftoppm','-jpeg',"data/*.pdf","data/"]
args = ['pdftoppm','-jpeg','data/sjd23d_mn.pdf','data/']
proc = subprocess.run(args)

The result seems to be useless if you do not specify the file name when converting PDF to image. It works a little differently from PHP. Perhaps there is a way to import pdftoppm into Python and use it, but that's another time.

4. Decompress the compressed file (to deal with garbled Japanese file names)

unar.php



$command = 'unar -f data/data/selenium-master.zip -D -o data';
shell_exec($command);

unar.py



import subprocess
args = ['unar','-f','data/selenium-master.zip','-D','-o','data/']
#If you need a password
args = ['unar','-f','-p','passw','data/selenium-master.zip','-D','-o','data/']
proc = subprocess.run(args)

Decompression of the compressed file looks good with this. 5.selenium This doesn't seem to be easy, so I'll go a little further.

6. Bulk deletion of folders and files

This was absolutely necessary to erase the traces of the unar unzipped files. First I'll show you how to do it in PHP. I'm going to do this with Python, but there seem to be various ways to do it, so I'll consider it from now on.

delete.php



foreach (glob($relative.'/data/*.*') as $file) {
  unlink($file);
}
foreach (glob($relative.'/data/*') as $file) {
  unlink($file);
  remove_directory($file);
}

function remove_directory($dir) {
    $files = array_diff(scandir($dir), array('.','..'));
    foreach ($files as $file) {
        //Separate processing by file or directory
        if (is_dir("$dir/$file")) {
            //If it is a directory, call the same function again
            remove_directory("$dir/$file");
        } else {
            //Delete file
            unlink("$dir/$file");
            //echo "File:" . $dir . "/" . $file . "Delete\n";
        }
    }
    //Delete the specified directory
    return rmdir($dir);
}

--In the case of Python, there is a way to delete the contents of the directory in one shot by specifying the directory name.

delete.py



import pathlib
import shutil
p = pathlib.Path('selenium-master')
shutil.rmtree(p)

--Delete files and directories in the directory specified by Python at once (well, it's a beautiful person)

all_del.py



import shutil
import glob
import os
#A list of files and folders in the directory my_Put it in the list.
my_list = glob.glob("./data/*")
# my_Flow the contents of list to the end.
for value in my_list:
#Separate the commands to be deleted depending on whether they are files or folders.
	if os.path.isfile(value):
		os.remove(value)
	elif os.path.isdir(value):
		shutil.rmtree(value)

There is no other site that publishes this method and code. You can also use this to select the extension of the file you want to erase. Above all, I wanted to erase the decompressed file. If there is a more beautiful method, please let me know.

7. Sanitizing

Functions are provided as standard in PHP, but it seems that Python is rarely seen.

sanitizing.php



$wfull = htmlspecialchars($wtarget, ENT_QUOTES);

There seems to be such a method, but I would like to consider it a little more.

sanitizing.py



import cgi
inlist = 'https://www.yahoo.co.jp/'
transform = cgi.escape(inlist)
print(transform)
# https://www.yahoo.co.jp/
inlist = '"><script>alert(document.cookie);</script>'
transform = cgi.escape(inlist)
print(transform)
#Well you can deal with sanitizing
# "&gt;&lt;script&gt;alert(document.cookie);&lt;/script&gt;

Recommended Posts

Experienced people who mainly use PHP talk about the 4th day of starting Python (subprocess)
Experienced people who mainly use PHP talk about the 4th day of starting Python (subprocess)
Experienced people who mainly use PHP talk about the 5th day of starting Python (selenium) PHP vs Python
Experienced people who mainly use PHP talk about the 4th day of starting Python (subprocess)
Experienced people who mainly use PHP talk about the 4th day of starting Python (subprocess)
Experienced people who mainly use PHP talk about the 5th day of starting Python (selenium) PHP vs Python
About the ease of Python
About the features of Python
March 14th is Pi Day. The story of calculating pi with python
About the basics list of Python basics
About Python code for simple moving average assuming the use of Numba
About the virtual environment of python version 3.7
About the ease of Python
About the features of Python
March 14th is Pi Day. The story of calculating pi with python
About the basics list of Python basics
About Python code for simple moving average assuming the use of Numba
About the virtual environment of python version 3.7
[Python2.7] Summary of how to use subprocess
Summary of the differences between PHP and Python
A note about the python version of python virtualenv
Image processing? The story of starting Python for
[Note] About the role of underscore "_" in Python
About the behavior of Model.get_or_create () of peewee in Python
About the * (asterisk) argument of python (and itertools.starmap)
[Python] Get the day of the week (English & Japanese)