[PYTHON] The value of meta when specifying a function with no return value in Dask dataframe apply

Before that, install Dask

One of the ways to do parallel computing in Python is Dask dataframe apply. (Google teacher told me yesterday.) Dask can be installed with pip as follows.

$ pip install dask 

Now import normally as below, but

import dask.dataframe as dd

I got an error like this

ModuleNotFoundError: No module named 'toolz'

So

$ pip install toolz

Still

ImportError: fsspec is required to use any file-system functionality.

While sharp

$ pip install fsspec

Now that it's finally ready for use, let's move on to the main subject.

Main subject

I converted a pandas DataFrame to a Dask dataframe and then tried to apply a function with no return value to a row like this:

import pandas as pd
import dask.dataframe as dd

#A function that outputs the sum of the values in columns A and B to the standard output
def print_sum(pd_series):
    print(pd_series['A'] + pd_series['B'])

A = pd.DataFrame({'A': [1.0, 1.5, 2.0 ], 'B': [5.0, 2.0, 1.2]},index = [1,2,3])
A_dd = dd.from_pandas(A, npartitions=2)

A_dd.apply(print_sum, axis = 1).compute(scheduler='processes')

If you run it with this, you will get the following warning.

You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using.

Apparently, you have to specify what kind of data type the function given to .apply () returns in the argument meta. But there is no return value ... C ++ If you do void Burn python or something like that ...

I checked! It's 'None'!

import pandas as pd
import dask.dataframe as dd

#A function that outputs the sum of the values in columns A and B to the standard output
def print_sum(pd_series):
    print(pd_series['A'] + pd_series['B'])

A = pd.DataFrame({'A': [1.0, 1.5, 2.0 ], 'B': [5.0, 2.0, 1.2]},index = [1,2,3])
A_dd = dd.from_pandas(A, npartitions=2)

A_dd.apply(print_sum, axis = 1, meta = 'None').compute(scheduler='processes') # meta = 'None'

This is the solution! I wrote it for a long time, but that's it!

Recommended Posts

The value of meta when specifying a function with no return value in Dask dataframe apply
Find the optimal value of a function with a genetic algorithm (Part 2)
Get the caller of a function in Python
To output a value even in the middle of a cell with Jupyter Notebook
This is a sample of function application in dataframe.
Get the value of a specific key in a list from the dictionary type in the list with Python
When a local variable with the same name as a global variable is defined in the function
[Linux] [C / C ++] How to get the return address value of a function and the function name of the caller
When a character string of a certain series is in the Key of the dictionary, the character string is converted to the Value of the dictionary.
Let's prove the addition theorem of trigonometric functions by replacing the function with a function in SymPy (≠ substitution)
A simple reason why the return value of round (2.675,2) is 2.67 in python (it should be 2.68 in reality ...)
When incrementing the value of a key that does not exist
Process the contents of the file in order with a shell script
Be careful when specifying the default argument value in Python3 series
[Python] Execution time when a function is entered in a dictionary value
If you give a list with the default argument of the function ...
A function that measures the processing time of a method in python
How to find the memory address of a Pandas dataframe value
Finding the optimum value of a function using a genetic algorithm (Part 1)
Create a function to get the contents of the database in Go
About the return value of pthread_mutex_init ()
About the return value of the histogram.
The story of a Parking Sensor in 10 minutes with GrovePi + Starter Kit
Generate a list packed with the number of days in the current month.
Receive a list of the results of parallel processing in Python with starmap
Find the minimum value of a function by particle swarm optimization (PSO)
I made a mistake in fetching the hierarchy with MultiIndex of pandas
I tried to display the altitude value of DTM in a graph
Get the return value of an external shell script (ls) with python3
Behavior when returning in the with block
Precautions when pickling a function in python
[Python] Precautions when finding the maximum and minimum values in a numpy array with a small number of elements
[Python & SQLite] I tried to analyze the expected value of a race with horses in the 1x win range ①
[python] [meta] Is the type of python a type?
In IPython, when I tried to see the value, it was a generator, so I came up with it when I was frustrated.
Function to extract the maximum and minimum values ​​in a slice with Go
Feel free to write a test with nose (in the case of + gevent)
Fill the missing value (null) of DataFrame with the values before and after with pyspark
[Python] Summary of functions that return the index that takes the closest value in the array
[AWS] Let's run a unit test of Lambda function in the local environment
The first thing to check when a No Reverse Match occurs in Django
Compare the sum of each element in two lists with the specified value in Python
How to get a list of files in the same directory with python
Add a function to return the minimum value (min) to the stack made with Python, but push / pop / min is basic O (1) !!
The return value (generator) of a function that combines finally and yield must not be passed directly to next
Draw a graph of a quadratic function in Python
[Python] Get the files in a folder with Python
Watch out for the return value of __len__
Make a copy of the list in Python
Find the number of days in a month
Find the divisor of the value entered in python
Fix the argument of the function used in map
Output in the form of a python array
Search by the value of the instance in the list
Isn't there a default value in the dictionary?
When a file is placed in the shared folder of Raspberry Pi, the process is executed.
[Python Data Frame] When the value is empty, fill it with the value of another column.
It became TLE when I confirmed the operation with the print function in the competition pro
How to identify the element with the smallest number of characters in a Python list?
Return value of quit ()-Is there anything returned by the "function that ends everything"?
A memo when checking whether the specified key exists in the defined dictionary with python