Perform Scala-like collection operations in Python

Background

Scala's collection operation is cool, isn't it? You can write neatly without creating extra intermediate variables.

Scala


val result = (0 to 10000)
      .filter(_ % 3 == 0)
      .map(_ + 1)
      .groupBy(_ % 10)
      .map { it =>
        val k = it._1
        val v = it._2.sum
        (k, v)
    }.toList

This code is the sum of the numbers when the numbers from 0 to 10000 are left in multiples of 3 and grouped by adding 1 and dividing by 10. This calculation has no particular meaning, but it is an example in which the data processing flow can be written in a very easy-to-understand (cool) manner with exactly the same order of thinking.

When I try to do this in Python ...

Python


import itertools
result = range(0, 10001)
result = filter(lambda x: x % 3 == 0, result)
result = map(lambda x: x + 1, result)
result = map(lambda x: (x % 10, x), result)
result = sorted(result)
result = itertools.groupby(result, lambda x: x[0])
result = map(lambda x: (x[0], sum(map(lambda _: _[1], x[1]))), result)
result = list(result)

It's hard to see and I can't even see it. By the way, if you write in one shot without using intermediate variables

Python


result = list(
    map(lambda x: (x[0], sum(map(lambda _: _[1], x[1]))),
        itertools.groupby(
            sorted(
                map(lambda x: (x % 10, x),
                    map(lambda x: x + 1,
                        filter(lambda x: x % 3 == 0,
                               range(0, 100001)
                        )
                    )
                ), lambda x: x[0]
            )
        )               
    )
)

The code has 0 readability. You may wake up to something when you can read this smoothly. The reason is that if you want to process in the order of f-> g-> h, you have to write in the reverse order like h (g (f (x))).

Actually, there is a library that solves this. Yes, with toolz, scalafunctional and fn.py. In this article, opinions such as ** Write in Scala ** are NG words.

Toolz, CyToolz

toolz is a library that extends Python's built-in ʻitertools and functoolsso that they can be written more functionally.cytoolzis also a faster version of it recreated in Cython. Thepipe` and curried functions implemented in these are very useful. The terrible code I mentioned earlier can be written as:

from cytoolz.curried import *
import operator as O
result = pipe(range(0, 10001),
    filter(lambda x: x % 3 == 0),
    map(lambda x: x + 1),
    reduceby(lambda x: x % 10, O.add),
    lambda d: d.items(),
    list
)

How is it? Give the data you want to process with the first argument, and give the functions you want to apply with the second and subsequent arguments one after another. If you are familiar with the R language, you may think of dplyr. By the way, the filter, map, and reduceby used here are all curried, somap (f, data)is multiplied likemap (f) (data). You can connect with pipe like this. If you don't use the curried one, replace pipe with thread_last and the data processed by the previous function will be passed to the last argument of each function one after another.

ScalaFunctional

As the name implies, scalafunctional is a library that allows you to operate on Scala collections like. ~~ If you want to go that far, add Scala (ry ~~ In this library, put list, dict, etc. in a dedicated class called seq and process it in a chain of dots.

from functional import seq
result = seq(range(0, 10001)) \
    .filter(lambda x: x % 3 == 0) \
    .map(lambda x: x + 1) \
    .map(lambda x: (x % 10, x)) \
    .reduce_by_key(O.add) \
    .to_list()

This is the closest to Scala. However, Python requires a backslash at the end of the line, which is a bit annoying. After that, Python's lambda expression is not as flexible as Scala's function, so you may need to def the function once for complicated processing. Either way, it's very simple and beautiful.

fn.py

fn.py is also a library for functional programming in Python. The biggest feature is that it can be written like a Scala placeholder.

from fn import _
result = map(_ + 1, range(10))

You can simply use it instead of lambda.

f = _ + 1
f(10)

>>>
11

It goes well with toolz and scalafunctional.

toolz


result = pipe(range(10),
    map(_ + 1),
    list
)

scalafunctional


result = seq(range(10)) \
    .map(_ + 1) \
    .to_list()

By the way, in IPython etc., _ seems to be after reservation to represent the last output, so when using it there, it is necessary to ʻimport` with another name.

from fn import _ as it

Summary

Use toolz and scalafunctional to speed up functional programming in Python. scalafunctional can be written just like a Scala collection operation. On the other hand, you can use pipe of toolz to write more versatile data processing flow as well as collection operation. Please combine these well with fn.py and enjoy Functional Python Life [^ pandas].

All the libraries used this time are published on GitHub. Of course, it is also registered on PyPI, so you can install it with pip.

[^ pandas]: Even with the standard Pandas, it is possible to connect processing to DataFrame with a dot chain to some extent.

Recommended Posts

Perform Scala-like collection operations in Python
File operations in Python
File operations in Python
Four arithmetic operations in python
Wrapping git operations in Python
Image Processing Collection in Python
A collection of Excel operations often used in Python
ORC, Parquet file operations in Python
Generate a first class collection in Python
Scientific Programming Petit Tech Collection in Python
[Python] Understand list slicing operations in seconds
Quadtree in Python --2
CURL in python
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
Meta-analysis in Python
Unittest in python
Epoch in Python
Discord in Python
Sudoku in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Programming in python
Plink in Python
Constant in python
Lifegame in Python.
FizzBuzz in Python
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
nCr in Python.
format in python
Scons in Python3
Puyo Puyo in python
python in virtualenv
PPAP in Python
Quad-tree in Python
Reflection in Python
Chemistry in Python
Hashable in python
DirectLiNGAM in Python
LiNGAM in Python
Flatten in python
flatten in python
Perform entity analysis using spaCy / GiNZA in Python
Summary of Excel operations using OpenPyXL in Python
Sorted list in Python
Daily AtCoder # 36 in Python
Clustering text in Python
Daily AtCoder # 2 in Python
Implement Enigma in python
Daily AtCoder # 32 in Python