Read files in parallel with Python

What is this

It is a memorandum.

Update record

2020/5/21 Added description about python3.8

environment

Thing you want to do

sample.py


import os

def hoge(filename):
  #Read the file and do some processing.
  return data

data = [hoge(filename) for filename in os.listdir(".")]

I would like to do something like that, but if I turn this code as it is, it will use only one core, so it is a waste of time. I want to do it in parallel.

Parallel processing

sample.py


from multiprocessing import Pool
import os

def hoge(filename):
  #Read the file and do some processing.
  #This function can be left as it is
  return data

with Pool() as p:
    data = p.map(hoge, os.listdir("."))

Works in parallel with.

Also the progress bar

sample.py


from tqdm import tqdm
from multiprocessing import Pool
import os

with Pool() as p:
    imap = p.imap(hoge, os.listdir("."))
    data = list(tqdm(imap, total=len(os.listdir("."))))

About Python 3.8

It's stuck for some reason. This is a provisional response,

sample.py



from multiprocessing import Pool
import os

def hoge(filename):
  #Read the file and do some processing.
  #This function can be left as it is
  return data

if __name__=="__main__":
    with Pool() as p:
        data = p.map(hoge, os.listdir("."))

It seems to work by doing. However, it seems that the `` `datavariable created with much effort can be referenced only within the scope ofif name ==" main "```.

Also, previously it was possible to execute in parallel on Jupyter, but it is not possible in Python 3.8.

Recommended Posts

Read files in parallel with Python
Read and write JSON files in Python
Read text in images with python OCR
Read DXF in python
Parallel download in Python
[Python] Get the files in a folder with Python
Read table data in PDF file with Python
Handle zip files with Japanese filenames in Python 3
Character encoding when dealing with files in Python 3
Parallel processing with no deep meaning in Python
Read wav files with only Python standard packages
Read csv with python pandas
Working with LibreOffice in Python
Scraping with chromedriver in python
Run Python unittests in parallel
Read Euler's formula in Python
Working with sounds in Python
Scraping with Selenium in Python
Sorting image files with Python (3)
Scraping with Tor in Python
Read Namespace-specified XML in Python
Sorting image files with Python
Read Outlook emails in Python
Combined with permutations in Python
Integrate PDF files with Python
Reading .txt files with Python
Read Fortran output in python
Read json data with python
Read and write files with Slackbot ~ Bot development with Python ~
Transpose CSV files in Python Part 1
Testing with random numbers in Python
[Python] Easy parallel processing with Joblib
GOTO in Python with Sublime Text 3
Recursively unzip zip files with python
Read Protocol Buffers data in Python3
Working with LibreOffice in Python: import
Scraping with Selenium in Python (Basic)
Manipulating EAGLE .brd files with Python
Manipulate files and folders in Python
[Python] POST wav files with requests [POST]
Numer0n with items made in Python
Read and use Python files from Python
Handling of JSON files in Python
Open UTF-8 with BOM in Python
Download Google Drive files in Python
Read PNG chunks in Python (class)
[R] [Python] Memo to read multiple csv files in multiple zip files
Decrypt files encrypted with OpenSSL with Python 3
Use rospy with virtualenv in Python3
[python] Read information with Redmine API
Sort large text files in Python
Handle Excel CSV files with Python
Use Python in pyenv with NeoVim
Export and output files in Python
Password generation in texto with python
Use OpenCV with Python 3 in Window
[Internal_math (1)] Read with Green Coder AtCoder Library ~ Implementation in Python ~
Until dealing with python in Atom
Easy parallel execution with python subprocess
Read fbx from python with cinema4d
Get started with Python in Blender