--numpy 1.16.3 or later
Python code example
np.load('/path/to/file.npy')
Examples of errors that occur
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-37-1db66562b57b> in <module>
----> 1 np.load('tmp.npy')
~/venv/aep/lib/python3.7/site-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
451 else:
452 return format.read_array(fid, allow_pickle=allow_pickle,
--> 453 pickle_kwargs=pickle_kwargs)
454 else:
455 # Try a pickle
~/venv/aep/lib/python3.7/site-packages/numpy/lib/format.py in read_array(fp, allow_pickle, pickle_kwargs)
720 # The array contained Python objects. We need to unpickle the data.
721 if not allow_pickle:
--> 722 raise ValueError("Object arrays cannot be loaded when "
723 "allow_pickle=False")
724 if pickle_kwargs is None:
ValueError: Object arrays cannot be loaded when allow_pickle=False
Since numpy v1.16.3, the behavior of thenumpy.load ()function has changed.
| Change before | After change |
|---|---|
allow_pickleThe default value for the option isTrue |
allow_pickleThe default value for the option isFalse |
After confirming that there are no ** security concerns ** described later, specify the ʻallow_pickle` option as shown below.
np.load('/path/to/file.npy', allow_pickle=True)
dtypeThe numpy matrix (np.ndarray) can store strings and Python objects as well as numbers. The type of stored value is reflected in the attribute dtype.
numpy v1.16.0A vulnerability has been reported that could allow malicious code to be executed when serializing a numpy matrix (a file that serializes) containing Python objects with np.load (). (However, there is a counterargument regarding this vulnerability)
Therefore, from v1.16.3, the default behavior ofnp.load ()is changed as described above, and if dtype is a Python object, if ʻallow_pickle = False, ValueError` is thrown. It was way.
It can be said that it is a specification change to push it to the safer side.
As a matter of course, don't np.load (allow_pickle = True) for ** untrusted files **. As mentioned in the previous section, it is possible to execute arbitrary code.
There is usually no problem with ad hoc code such as data formatting by Jupyter and machine learning [^ 1]. Note that application developers use Python.
[^ 1]: There is a problem with the * .npy file given by a malicious colleague (?).
Of course, I think it's a Breaking Change because it changes the behavior of the application.
Python's math library may have a tendency to be safe if you change the default value. [^ 2] If you think that it's okay because it's a revision upgrade, it will hurt. Please be careful of application engineers who have entered from other languages.
[^ 2]: Other examples include the default value of n_estimator in sklearn.ensemble.RandomForestClassifier.
Recommended Posts