Background

Scala's collection operation is cool, isn't it? You can write neatly without creating extra intermediate variables.

`Scala`


val result = (0 to 10000)
      .filter(_ % 3 == 0)
      .map(_ + 1)
      .groupBy(_ % 10)
      .map { it =>
        val k = it._1
        val v = it._2.sum
        (k, v)
    }.toList

This code is the sum of the numbers when the numbers from 0 to 10000 are left in multiples of 3 and grouped by adding 1 and dividing by 10. This calculation has no particular meaning, but it is an example in which the data processing flow can be written in a very easy-to-understand (cool) manner with exactly the same order of thinking.

When I try to do this in Python ...

`Python`


import itertools
result = range(0, 10001)
result = filter(lambda x: x % 3 == 0, result)
result = map(lambda x: x + 1, result)
result = map(lambda x: (x % 10, x), result)
result = sorted(result)
result = itertools.groupby(result, lambda x: x[0])
result = map(lambda x: (x[0], sum(map(lambda _: _[1], x[1]))), result)
result = list(result)

It's hard to see and I can't even see it. By the way, if you write in one shot without using intermediate variables

`Python`


result = list(
    map(lambda x: (x[0], sum(map(lambda _: _[1], x[1]))),
        itertools.groupby(
            sorted(
                map(lambda x: (x % 10, x),
                    map(lambda x: x + 1,
                        filter(lambda x: x % 3 == 0,
                               range(0, 100001)
                        )
                    )
                ), lambda x: x[0]
            )
        )               
    )
)

The code has 0 readability. You may wake up to something when you can read this smoothly. The reason is that if you want to process in the order of f-> g-> h, you have to write in the reverse order like h (g (f (x))).

Actually, there is a library that solves this. Yes, with toolz, scalafunctional and fn.py. In this article, opinions such as ** Write in Scala ** are NG words.

Toolz, CyToolz

toolz is a library that extends Python's built-in ʻitertools and functoolsso that they can be written more functionally.cytoolzis also a faster version of it recreated in Cython. Thepipe` and curried functions implemented in these are very useful. The terrible code I mentioned earlier can be written as:

from cytoolz.curried import *
import operator as O
result = pipe(range(0, 10001),
    filter(lambda x: x % 3 == 0),
    map(lambda x: x + 1),
    reduceby(lambda x: x % 10, O.add),
    lambda d: d.items(),
    list
)

How is it? Give the data you want to process with the first argument, and give the functions you want to apply with the second and subsequent arguments one after another. If you are familiar with the R language, you may think of dplyr. By the way, the filter, map, and reduceby used here are all curried, somap (f, data)is multiplied likemap (f) (data). You can connect with pipe like this. If you don't use the curried one, replace pipe with thread_last and the data processed by the previous function will be passed to the last argument of each function one after another.

ScalaFunctional

As the name implies, scalafunctional is a library that allows you to operate on Scala collections like. ~~ If you want to go that far, add Scala (ry ~~ In this library, put list, dict, etc. in a dedicated class called seq and process it in a chain of dots.

from functional import seq
result = seq(range(0, 10001)) \
    .filter(lambda x: x % 3 == 0) \
    .map(lambda x: x + 1) \
    .map(lambda x: (x % 10, x)) \
    .reduce_by_key(O.add) \
    .to_list()

This is the closest to Scala. However, Python requires a backslash at the end of the line, which is a bit annoying. After that, Python's lambda expression is not as flexible as Scala's function, so you may need to def the function once for complicated processing. Either way, it's very simple and beautiful.

fn.py

fn.py is also a library for functional programming in Python. The biggest feature is that it can be written like a Scala placeholder.

from fn import _
result = map(_ + 1, range(10))

You can simply use it instead of lambda.

f = _ + 1
f(10)

>>>
11

It goes well with toolz and scalafunctional.

`toolz`


result = pipe(range(10),
    map(_ + 1),
    list
)

`scalafunctional`


result = seq(range(10)) \
    .map(_ + 1) \
    .to_list()

By the way, in IPython etc., _ seems to be after reservation to represent the last output, so when using it there, it is necessary to ʻimport` with another name.

from fn import _ as it

Summary

Use toolz and scalafunctional to speed up functional programming in Python. scalafunctional can be written just like a Scala collection operation. On the other hand, you can use pipe of toolz to write more versatile data processing flow as well as collection operation. Please combine these well with fn.py and enjoy Functional Python Life [^ pandas].

All the libraries used this time are published on GitHub. Of course, it is also registered on PyPI, so you can install it with pip.

Toolz : https://github.com/pytoolz/toolz
CyToolz : https://github.com/pytoolz/cytoolz/
ScalaFunctional : https://github.com/EntilZha/ScalaFunctional
fn.py : https://github.com/kachayev/fn.py

[^ pandas]: Even with the standard Pandas, it is possible to connect processing to DataFrame with a dot chain to some extent.

Perform Scala-like collection operations in Python

Scala

Python

Python

toolz

scalafunctional

Summary

`Scala`

`Python`

`Python`

`toolz`

`scalafunctional`