[PYTHON] Algebraic data types and pattern matching

The title is unclear.

Motivation

[Algebraic Data Type](http://en.wikipedia.org/wiki/%E4%BB%A3%E6%95%B0%E7%9A%84%E3%83%87%E3%83%BC% What I wanted to know about E3% 82% BF% E5% 9E% 8B) [ADTs for Python] I think it is written in (http://stupidpythonideas.blogspot.jp/2014/08/adts-for-python.html). However, I didn't understand well even if I read the explanation of wikipedia (Japanese) at first, and I read some blogs and codes to improve my understanding, so I read the blog article of Kudan first. It cannot be denied that there may have been a shortage.

I haven't found an introductory and easy-to-understand Japanese data type for algebraic data types, but I thought the following explanation was straightforward and easy to accept.

** Has rich data types ** In general, functional languages integrate structures (direct products), unions (direct sums), enums and recursive types to provide rich data types that can be expressed concisely. With this rich data type, for example, tree structures can be defined concisely and safely (without dangerous nulls). Of course, this tree structure is persistent data.

In addition, you can operate data intuitively by using rich data types and pattern matching, which is a function that is integrated on both sides. The richer the types, the easier and safer it is to solve problems, making programming fun.

Algebraic data types in Python

Python is not a functional programming language and does not support algebraic data types.

There was a lot of discussion about the Python type system last summer when there was a suggestion to incorporate the mypy type annotation syntax into Python 3.5. [ADTs for Python] by Andrew Barnert (http://stupidpythonideas.blogspot.jp/2014/08/adts-for-python.html) is also one of the articles written at that time. Basically, as you read through this article, consider it for yourself.

I wrote it as a summary of what I read, but I think that my understanding and interpretation may be wrong, so if there is something wrong, I welcome you.

Cartesian product types

In many programming languages, Cartesian products are represented as structures or objects (as in object-oriented programming).

Note the equivalency to the "implicit namedtuple literals" that people suggest for Python every few months.

[Namedtuple] in Python It is said that (http://docs.python.jp/3/library/collections.html#collections.namedtuple) corresponds to the direct product type as a literal.

3.4


>>> from collections import namedtuple
>>> Stuff = namedtuple('Stuff', ['answer', 'manna'])
>>> type(Stuff)
<class 'type'>
>>> s = Stuff(42, 'spam')
>>> s
Stuff(answer=42, manna='spam')
>>> s.secret = 'cannot assign'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Stuff' object has no attribute 'secret'

Also, as an alternative to namedtuple [_ _ * slots * __] It is supplemented that using (http://docs.python.jp/3/reference/datamodel.html#slots) is also close to the named record type.

3.4


>>> class StuffSlots:
...     __slots__ = ('answer', 'manna')
...     def __init__(self, answer, manna):
...         self.answer = answer
...         self.manna = manna
... 
>>> s2 = StuffSlots(52, 'ham')
>>> s2.answer
52
>>> s2.manna
'ham'
>>> s2.secret = 'cannot assign'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'StuffSlots' object has no attribute 'secret'

However, the fact that it is not a tuple may cause theoretical problems even if it is practically good. It is said that the reason is that there is nothing to force the constructor argument to * slots * and it is hard to imagine how to code anonymously inline expansion.

Let's try expressing * StuffSlots * mentioned above with the constructor of the * type * class.

3.4


>>> StuffSlotsType = type('StuffSlotsType', (object,), dict(__slots__=('answer', 'manna'), __init__=lambda self, answer, manna: setattr(self, 'answer', answer) or setattr(self, 'manna', manna)))
>>> s3 = StuffSlotsType(62, 'egg')
>>> s3.answer
62
>>> s3.manna
'egg'
>>> s3.secret = 'cannot assign'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'StuffSlotsType' object has no attribute 'secret'
>>> s3.__slots__
('answer', 'manna')
>>> s3.__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'StuffSlotsType' object has no attribute '__dict__'

I wonder if it's complicated to have to define * __slots__ * and * __ init__ * correctly.

Direct sum types (Sum types)

I think one of the characteristics of functional programming languages is that this can be expressed concisely.

Python doesn't have sum types at all, except implicitly.

I don't know what this "except implicit" refers to, but there is no direct sum equivalent in Python.

And explicit sum types, especially either tagged sums of anonymous products or sums of named products, are exactly what people are missing when they ask for ADTs in Python.

As an "explicit" Cartesian product type, it seems to come up with an algebraic data type that Python lacks when considering an anonymous Cartesian product sum or a named Cartesian product sum.

I think that "explicit" here refers to how to implement it in the existing Python type system. We are considering how to achieve this from two approaches.

MyPy Union

Introducing the definition of * Maybe * like Haskell. In the original article, the type variable * T * is used, and ** | ** (pipe) is used instead of * Union *.

While this only gives us untagged unions, as noted above, tagged unions are equivalent to untagged unions of record types. So, if we wanted to define a Haskell-like Maybe, we could do this:

Just = namedtuple('Just', (('value', T,)))
Nothing = namedtuple('Nothing', ())
Maybe = Just | Nothing

> Also, without pattern matching, the only way to actually use this is with isinstance checks (or EAFP try/except):

> ```py3
    def spam(eggs: Maybe[int]):
        if isinstance(n, Just[int]):
            print('Spam with {} eggs'.format(eggs.value))
        else:
            print('The spam stands alone')

I haven't explicitly stated that this is pseudo code, but this code can't be run in Python (mypy).

So, the actual executable Python code is as follows.

mypy_sum1.py


# -*- coding: utf-8 -*-
from typing import typevar, Generic, Union

T = typevar('T')

class Just(Generic[T]):

    __slots__ = ('value',)

    def __init__(self, value: T) -> None:
        self.value = value

class Nothing:
    """
    mypy claims by defining Nothing as below

    >>> Nothing = namedtuple('Nothing', ())
    List literal expected as the second argument to namedtuple()

    >>> Nothing = type('Nothing', (), dict(__slots__=()))
    Invalid type "__main__.Nothing"
    """
    __slots__ = ()

Maybe = Union[Just, Nothing]

def spam(eggs: Maybe):
    if isinstance(eggs, Just[int]):
        eggs.value += '+1'
        print('Spam with {} eggs[int].'.format(eggs.value))
    elif isinstance(eggs, Just[str]):
        eggs.value += 1
        print('Spam with {} eggs[str].'.format(eggs.value))
    else:
        print('The spam stands alone')

Since mypy's * namedtuple * typechecks behave confusingly, we're using * __slots__ * to define an alternative to the Cartesian product (* Just * and * Nothing *). (To avoid mypy type checker errors).

The syntax of mypy is to define a direct sum type as follows.

3.4


Maybe = Union[Just, Nothing]

Now let's run the type checker.

(mypy)$ mypy sum_mypy1.py 
sum_mypy1.py: In function "spam":
sum_mypy1.py, line 29: Unsupported operand types for + ("int" and "str")
sum_mypy1.py, line 32: Unsupported operand types for + ("str" and "int")

It's confusing because it doesn't list the line numbers, but I was able to catch the type error at the intended ʻeggs.value + = ... `of the * spam () * function.

About the implementation of this * spam () * function by * isinstance *

There's nothing theoretically wrong here, but it certainly is ugly.

It's not wrong in theory, but it's awkward, Andrew says: cry:

As an aside from here, this code has the correct syntax for mypy, but the wrong code for Python.

3.4


>>> from sum_mypy1 import *
>>> spam(Just[int](3))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../sum_mypy1.py", line 29, in spam
    eggs.value += '+1'
TypeError: unsupported operand type(s) for +=: 'int' and 'str'

When taking an int type value in the constructor, a type error will occur as with the type checker,

3.4


>>> spam(Just[str]('ham'))
Spam with ham+1 eggs[int].

It can be executed when taking the value of str type to the constructor.

This is because * Just [T] * is a syntactic representation of mypy, and in Python code * isinstance (eggs, Just [T]) * is only * isinstance (eggs, Just) *. Therefore, the code that assumes * Just [int] * of the * spam () * function is executed.

The syntax of mypy is implemented so that it doesn't affect the execution of existing Python code, so this may be the boundary between mypy and Python's dynamic typing.

Swift-style Enum

Another way to represent direct sum types is to introduce Swift style enum types.

In Python's enum module, the following code does not work, but I think the intention of this pseudo code is easy to understand.

3.4


class Barcode(Enum):
    UPCA: (int, int, int, int) = 1
    QRCode: str = 2

A pattern in which each argument passed to the constructor of an enum constant has a name.

3.4


class Barcode(Enum):
    UPCA: (LL: int, L: int, R: int, RR: int) = 1
    QRCode: str = 2

Recursive type

One of the important features of algebraic data types is that they can be handled recursively.

When the list is represented by algebraic data type as follows,

List(Cons(head: T, tail: List[T]) | Nil())

List types consist of Cons and Nil types, * Nil () * is an instance of Nil and List types, and * Cons (1, Nil ()) * is a Cons [int] type and List [int] type. It is also an instance.

At this time, it is necessary to be able to resolve the List type (name) from Cons (constructor).

For Python, either the forward reference (http://mypy.readthedocs.org/en/latest/kinds_of_types.html#class-name-forward-references), as it is done in mypy. It is stated that you must define a forward type or record type that can be treated as if the type were defined.

Are algebraic data types just a subset of classes?

The entity of * namedtuple * and * enum * is a class in Python. And in Python all types are classes.

More importantly, ADTs are a strict subset of classes. The "strict subset" part is what makes them useful: they have a static structure, they have

The important thing is that algebraic data types are * "strict subsets" * of classes. What's useful about this * strict subset * is that it has a static structure ...?

I'm sorry, I don't understand the English translation and intention here. Please let me know. : cry:

Do Python Really Need Algebraic Data Types?

The big question is whether you want to use algebraic data types implemented in * namedtuple * or * enum * instead of classes.

If the language had pattern matching that only worked on ADTs, the answer might well be "quite often".

If the language has pattern matching that deals only with algebraic data types, the answer is * "quite often" *.

But without pattern matching, or with pattern matching that worked on classes that chose to opt in (as described in my previous blog post), I think it wouldn't be that often.

However, Andrew thinks it wouldn't be * "often" * if there is no pattern matching, or if pattern matching works on a class with opt-in selected.

The original article introduces pseudocode that mimics ML or Haskel-like pattern matching that parses JSON to explain algebraic data types and pattern matching paradigms.

class Name((first: str, last: str)):
    def tojson(self):
        return JSON.object({'first': self.first, 'last': self.last})

    @classmethod
    def fromjson(cls, j):
        case j:
            of JSON.object as o:
                return cls(o['first'], o['last'])
        raise ValueError('{} is not a name'.format(j))

class Person((aliases: Sequence[Name], age: float)):
    def tojson(self):
        return JSON.object({'aliases': JSON.array(map(Name.tojson, self.aliases)), 'age': age})

    @classmethod
    def fromjson(cls, j):
        case j:
            of JSON.object as o:
                case o['aliases']:
                    of JSON.array as a:
                        return cls([Name.fromjson(name) for name in a], o['age'])
        raise ValueError('{} is not a person'.format(j))

If there is no pattern matching, it may be common (?) To implement the process of traversing such a tree structure with Visitor pattern. ..

In the introductory article on functional programming introduced at the beginning

Data can be manipulated intuitively by rich data types and pattern matching, which is an integrated function.

You can tell what the sentence is intended for.

Summary

If you learn the concept of functional programming algebraic data types along with pattern matching, you will find it easy to understand the simplicity and effectiveness of their expressiveness.

So, is that a good enough use case to include ADTs in Python?

Recommended Posts

Algebraic data types and pattern matching
Algebraic data types and FizzBuzz
Algebraic data types and object-oriented programming
Understanding data types and beginning linear regression
Python variables and data types learned in chemoinformatics
cv2 functions and data types (OpenCV python bindings)
Point and Figure Data Modeling
Extract csv data and calculate