One of the ways to do parallel computing in Python is Dask dataframe apply. (Google teacher told me yesterday.) Dask can be installed with pip as follows.
$ pip install dask
Now import normally as below, but
import dask.dataframe as dd
I got an error like this
ModuleNotFoundError: No module named 'toolz'
So
$ pip install toolz
Still
ImportError: fsspec is required to use any file-system functionality.
While sharp
$ pip install fsspec
Now that it's finally ready for use, let's move on to the main subject.
I converted a pandas DataFrame to a Dask dataframe and then tried to apply a function with no return value to a row like this:
import pandas as pd
import dask.dataframe as dd
#A function that outputs the sum of the values in columns A and B to the standard output
def print_sum(pd_series):
print(pd_series['A'] + pd_series['B'])
A = pd.DataFrame({'A': [1.0, 1.5, 2.0 ], 'B': [5.0, 2.0, 1.2]},index = [1,2,3])
A_dd = dd.from_pandas(A, npartitions=2)
A_dd.apply(print_sum, axis = 1).compute(scheduler='processes')
If you run it with this, you will get the following warning.
You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using.
Apparently, you have to specify what kind of data type the function given to .apply ()
returns in the argument meta
. But there is no return value ... C ++ If you do void
Burn python or something like that ...
I checked! It's 'None'
!
import pandas as pd
import dask.dataframe as dd
#A function that outputs the sum of the values in columns A and B to the standard output
def print_sum(pd_series):
print(pd_series['A'] + pd_series['B'])
A = pd.DataFrame({'A': [1.0, 1.5, 2.0 ], 'B': [5.0, 2.0, 1.2]},index = [1,2,3])
A_dd = dd.from_pandas(A, npartitions=2)
A_dd.apply(print_sum, axis = 1, meta = 'None').compute(scheduler='processes') # meta = 'None'
This is the solution! I wrote it for a long time, but that's it!
Recommended Posts