Background
Scala's collection operation is cool, isn't it? You can write neatly without creating extra intermediate variables.
Scala
val result = (0 to 10000)
.filter(_ % 3 == 0)
.map(_ + 1)
.groupBy(_ % 10)
.map { it =>
val k = it._1
val v = it._2.sum
(k, v)
}.toList
This code is the sum of the numbers when the numbers from 0 to 10000 are left in multiples of 3 and grouped by adding 1 and dividing by 10. This calculation has no particular meaning, but it is an example in which the data processing flow can be written in a very easy-to-understand (cool) manner with exactly the same order of thinking.
When I try to do this in Python ...
Python
import itertools
result = range(0, 10001)
result = filter(lambda x: x % 3 == 0, result)
result = map(lambda x: x + 1, result)
result = map(lambda x: (x % 10, x), result)
result = sorted(result)
result = itertools.groupby(result, lambda x: x[0])
result = map(lambda x: (x[0], sum(map(lambda _: _[1], x[1]))), result)
result = list(result)
It's hard to see and I can't even see it. By the way, if you write in one shot without using intermediate variables
Python
result = list(
map(lambda x: (x[0], sum(map(lambda _: _[1], x[1]))),
itertools.groupby(
sorted(
map(lambda x: (x % 10, x),
map(lambda x: x + 1,
filter(lambda x: x % 3 == 0,
range(0, 100001)
)
)
), lambda x: x[0]
)
)
)
)
The code has 0 readability. You may wake up to something when you can read this smoothly. The reason is that if you want to process in the order of f-> g-> h, you have to write in the reverse order like h (g (f (x))).
Actually, there is a library that solves this. Yes, with toolz
, scalafunctional
and fn.py
. In this article, opinions such as ** Write in Scala ** are NG words.
Toolz, CyToolz
toolz
is a library that extends Python's built-in ʻitertools and
functoolsso that they can be written more functionally.
cytoolzis also a faster version of it recreated in Cython. The
pipe` and curried functions implemented in these are very useful. The terrible code I mentioned earlier can be written as:
from cytoolz.curried import *
import operator as O
result = pipe(range(0, 10001),
filter(lambda x: x % 3 == 0),
map(lambda x: x + 1),
reduceby(lambda x: x % 10, O.add),
lambda d: d.items(),
list
)
How is it? Give the data you want to process with the first argument, and give the functions you want to apply with the second and subsequent arguments one after another. If you are familiar with the R language, you may think of dplyr
. By the way, the filter
, map
, and reduceby
used here are all curried, somap (f, data)
is multiplied likemap (f) (data)
. You can connect with pipe
like this. If you don't use the curried one, replace pipe
with thread_last
and the data processed by the previous function will be passed to the last argument of each function one after another.
ScalaFunctional
As the name implies, scalafunctional
is a library that allows you to operate on Scala collections like. ~~ If you want to go that far, add Scala (ry ~~ In this library, put list
, dict
, etc. in a dedicated class called seq
and process it in a chain of dots.
from functional import seq
result = seq(range(0, 10001)) \
.filter(lambda x: x % 3 == 0) \
.map(lambda x: x + 1) \
.map(lambda x: (x % 10, x)) \
.reduce_by_key(O.add) \
.to_list()
This is the closest to Scala. However, Python requires a backslash at the end of the line, which is a bit annoying. After that, Python's lambda
expression is not as flexible as Scala's function, so you may need to def
the function once for complicated processing. Either way, it's very simple and beautiful.
fn.py
fn.py
is also a library for functional programming in Python. The biggest feature is that it can be written like a Scala placeholder.
from fn import _
result = map(_ + 1, range(10))
You can simply use it instead of lambda
.
f = _ + 1
f(10)
>>>
11
It goes well with toolz
and scalafunctional
.
toolz
result = pipe(range(10),
map(_ + 1),
list
)
scalafunctional
result = seq(range(10)) \
.map(_ + 1) \
.to_list()
By the way, in IPython etc., _
seems to be after reservation to represent the last output, so when using it there, it is necessary to ʻimport` with another name.
from fn import _ as it
Use toolz
and scalafunctional
to speed up functional programming in Python. scalafunctional
can be written just like a Scala collection operation. On the other hand, you can use pipe
of toolz
to write more versatile data processing flow as well as collection operation. Please combine these well with fn.py
and enjoy Functional Python Life [^ pandas].
All the libraries used this time are published on GitHub. Of course, it is also registered on PyPI, so you can install it with pip
.
[^ pandas]: Even with the standard Pandas, it is possible to connect processing to DataFrame with a dot chain to some extent.
Recommended Posts