[PYTHON] What to do if ʻObject arrays cannot be loaded when allow_pickle = False` occurs in numpy.load ()

--numpy 1.16.3 or later

phenomenon

Python code example

np.load('/path/to/file.npy')

Examples of errors that occur

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-37-1db66562b57b> in <module>
----> 1 np.load('tmp.npy')

~/venv/aep/lib/python3.7/site-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
    451             else:
    452                 return format.read_array(fid, allow_pickle=allow_pickle,
--> 453                                          pickle_kwargs=pickle_kwargs)
    454         else:
    455             # Try a pickle

~/venv/aep/lib/python3.7/site-packages/numpy/lib/format.py in read_array(fp, allow_pickle, pickle_kwargs)
    720         # The array contained Python objects. We need to unpickle the data.
    721         if not allow_pickle:
--> 722             raise ValueError("Object arrays cannot be loaded when "
    723                              "allow_pickle=False")
    724         if pickle_kwargs is None:

ValueError: Object arrays cannot be loaded when allow_pickle=False

Cause

Since numpy v1.16.3, the behavior of thenumpy.load ()function has changed.

Change before After change
allow_pickleThe default value for the option isTrue allow_pickleThe default value for the option isFalse

Solution

After confirming that there are no ** security concerns ** described later, specify the ʻallow_pickle` option as shown below.

np.load('/path/to/file.npy', allow_pickle=True)

Commentary

numpy matrix and dtype

The numpy matrix (np.ndarray) can store strings and Python objects as well as numbers. The type of stored value is reflected in the attribute dtype.

Vulnerability in numpy v1.16.0

A vulnerability has been reported that could allow malicious code to be executed when serializing a numpy matrix (a file that serializes) containing Python objects with np.load (). (However, there is a counterargument regarding this vulnerability)

Therefore, from v1.16.3, the default behavior ofnp.load ()is changed as described above, and if dtype is a Python object, if ʻallow_pickle = False, ValueError` is thrown. It was way. It can be said that it is a specification change to push it to the safer side.

Security concerns

As a matter of course, don't np.load (allow_pickle = True) for ** untrusted files **. As mentioned in the previous section, it is possible to execute arbitrary code.

There is usually no problem with ad hoc code such as data formatting by Jupyter and machine learning [^ 1]. Note that application developers use Python.

NG example
`np.load (allow_pickle)` for files uploaded by users
OK example
`np.load (allow_pickle)`` serialized files in the system

[^ 1]: There is a problem with the * .npy file given by a malicious colleague (?).

Isn't this a Breaking Change?

Of course, I think it's a Breaking Change because it changes the behavior of the application.

Python's math library may have a tendency to be safe if you change the default value. [^ 2] If you think that it's okay because it's a revision upgrade, it will hurt. Please be careful of application engineers who have entered from other languages.

[^ 2]: Other examples include the default value of n_estimator in sklearn.ensemble.RandomForestClassifier.

Recommended Posts

What to do if ʻObject arrays cannot be loaded when allow_pickle = False` occurs in numpy.load ()
What to do if pip cannot be installed
What to do if a UnicodeDecodeError occurs in pip
What to do if the package dependency cannot be repaired
What to do if a 0xC0000005 error occurs in tf.train.start_queue_runners ()
What to do when a video cannot be read by cv2.VideoCapture
What to do if PyAudio cannot be installed on Python 3.7, 3.8, 3.9 on Windows
What to do when UnicodeDecodeError occurs during read_csv in pandas (pd.read_table ())
What to do when ModuleNotFoundError: No module named'XXX' occurs in Python
What to do if an error occurs when importing numpy with VScode
[OSX] [pyenv] What to do when an SSL error occurs in pip
What to do if pipreqs results in UnicodeDecodeError
What to do if mod_fcgid cannot resolve UnicodeEncodeError
What to do when PermissionError of tempfile.mkstemp occurs
[Python] What to do if an error occurs in pip (pyinstaller, pyautogui, etc.)
What to do if CERTIFICATE_VERIFY_FAILED occurs when nltk.download () is done on macOS pyhon
[python] What to do when an error occurs in send_keys of headless chrome
What to do when SSL error occurs in pip in Windows10, miniconda, VScode environment
What to do if a Unicode Encode Error occurs in Sublime Text Python
[Ubuntu 18.04 LTS] What to do when the screen resolution cannot be selected [NVIDIA]
What to do if abort is displayed when inputting camera video in OpenCV
Notes on what to do if "macOS 11 or later required!" Appears in Big Sur or pyarrow2.0.0 cannot be installed
What to do if pip install fails in Xcode 5.1
[openpyxl] What to do when IllegalCharacterError appears in pandas.DataFrame.to_excel
What to do when "cannot import name xxx" [Python]
What to do when is not in the sudoers file.This incident will be reported.
What to do if pvcreate produces a lot of WARNING and cannot be created
What to do if you get an error when importing matplotlib in Python (Mac)
What to do if ʻarguments [0] .scrollIntoView ();` fails in python selenium
What to do if pip gives a DistributionError in Homebrew
What to do when a Remove Error occurs when updating conda
What to do when a Missing artifact occurs in a jar that is not defined in pom.xml
What to do if you get "coverage unknown" in Coveralls
What to do when an error occurs with import _ssl
What to do if package installation fails when deploying to heroku
What to do when "SSL: CERTIFICATE_VERIFY_FAILED _ssl.c: 1056" appears in Python
What to do when "Invalid HTTP_HOST header" appears in Django
What to do if you can't log in as root
What to do if you get an error when running "certbot renew" in CakePHP environment
What to do if you get angry with'vertices' must be a 2D list ... in matplotlib arrow
[For IT beginners] What to do when the rev command cannot be used with Git Bash
What to do if you get an error saying c compiler cannot create executables in configure
What to do if you get a must override `get_config` error when trying to model.save in Keras
What to do if you get a minus zero in Python
[Beanstalk] What to do when an error occurs with import uuid
Measures to be taken when Suspicious Operation occurs in HttpResponse Redirect
What to do if Insecure Platform Warning appears when running Python
What to do if "Unnamed: 0" is added in to_csv-> read_csv in pandas
What to do if you can't use the trash in Lubuntu 18.04.
What to do when the value type is ambiguous in Python?
What to do when Ubuntu crashes
What to do if yum breaks
What to do if you get the message "" ~ .pkg "is corrupted and cannot be opened" when installing wxPython on Mac OS X
What to do if you get "The session could not be opened" when installing CentOS on VirtualBox
[Django] What to do if an Integrity Error occurs when registering data from the management site to the database
[Python] What to do if you get a ModuleNotFoundError when importing pandas using Jupyter Notebook in Anaconda
What to do if there is a decimal in python json .dumps
What to do when the result downloaded via scrapy is in English
What to do if you can't find PDO in Laravel or CakePHP
What to do if you can't use scikit grid search in Python
What to do if you get lost in file reference with FileNotFoundError