I was supposed to be looking at JEP 286, but before I knew it, I was implementing Hindley-Milner type inference in Kotlin.

This article is the 17th day article of M3 Advent Calendar 2017.

I'm distracted after @ nishiba's Gachi article, but recently my late-year-old daughter is so cute that her brain melts and her memory and comprehension are quite good. Please forgive it because it is decreasing.

Introduction

Now, at M3, Kotlin is hotter than the rest of the world. Participating members are also hot. There is certainly the potential to do apps, front desk, and server side, yes, with Kotlin.

That said, there are still many Java-based systems, and the need for Alt-Java as an alternative to reducing Java coding stress has been met every few years for more than a decade, with Groovy, Scala, and Kotlin. I've been showing excitement while changing.

Even in the Java world without going to Alt-Java, while Lombok and Play1's black magic give a glimpse of the future of Java, the original Java has gradually evolved, such as the diamond operator, Lambda, and Stream API. The IDE is also working hard. It's ridiculous, but it can be automatically generated, so you don't have to worry about writing a getter setter.

… But, needless to say, the evolution of Java was too slow compared to other languages.

As of Java9, what I find most "hard" in Java code compared to other Alt Java languages is that every time you declare a variable, you have to specify the type before the variable name. It hurts because the IDE cannot complement the flow of writing code.

Optional<Map<String, List<String>>> myMap = Optional.of(new HashMap<>());

…… It's too painful and messy.

But it's finally time for the Java code to be cleaner.

JEP 286: Local-Variable Type Inference.

GA scheduled for 2018/03/20 will enter Java10, so it's not far in the future! This will bring you a little closer to the experience of Kotlin! (Excessive)

When I hear 286, I feel reflexively 16-bit because of my mind.

Overview of JEP 286

--You can use var to initialize local variables --Cannot be used for class fields or method arguments --Can be used for index of for for (var hoge: hoges) [^ 1] [^ 1]: The plural of hoge is hoges?

--The compiler infers the appropriate type and outputs the bytecode replaced with that type, so there is no difference at the bytecode level. -- val is not provided for non-reassignment, but can be disabled with final var --The following initialization is not possible so that the type of the right side cannot be guessed.

Example of not being able to type infer with var


  var list = { 1, 2, 3 };              // new int[] { 1, 2, 3 };Then OK
  var list = Collections.emptyList();  // Collections.<String>emptyList()Then OK
  var func = x -> x + 1;               //I understand that this cannot be inferred
  var func = () -> "abc";              //Good luck Supplier<String>Inference …… It's certainly difficult
  var func = String::trim;             // Function<String, String>It's hard to infer

Thanks to this code

BufferedReader reader = new BufferedReader(new InputStreamReader(errorStream, Charsets.UTF_8));
StringBuilder sb = new StringBuilder();
Optional<Map<String, List<String>>> myMap = Optional.of(new HashMap<>());

It will look like this! !! !!

var reader = new BufferedReader(new InputStreamReader(errorStream, Charsets.UTF_8));
var sb = new StringBuilder();
var myMap = Optional.of(new HashMap<String, List<String>>());

It's too beautiful! !!

...... Well, I might not be so happy if I don't give a great example on purpose like the last line.

However, the fact that the variables appear in the same position is sober, but it greatly contributes to readability.

Previously, when I considered the var of lombok [^ 2] in a real project, I gave up adopting it due to its instability, but it seems that the day will come when I can use var.

[^ 2]: The way to implement var in Lombok was a class called lombok.var. Internally, it follows the AST (Abstract Syntax Tree) tree, and when a node that declares a variable of this class appears, it infers the type of the value on the right side and rewrites it to a node that declares that type. It seems.

Type inference is great!

Type inference Deep Dive

I'm a stray engineer [^ 5] who hasn't learned computer science properly, but I became interested in type inference, so I decided to investigate it.

[^ 5]: @gakuzzzz It was Naisho that I could only read up to Chapter 5 of the TaPL book borrowed from the teacher over two years. For the time being, I saw all the pages

The first step is from Wikipedia

Wikipedia-Type inference

You can't just call var type inference. surely. Hmmmm.

I want a little more information, so I'll take a look at the English page.

Wikipedia English version --Type Inference

I feel like I'm being told that I should study the Hindley Miler Type System. Alright, read through Hindley Milner Type System!

Oops ... softly closed ...

That's why I didn't go, and when I read it while squeezing,

--A type system originally announced by Dr. Hindley in 1969 and Dr. Milner in 1978. Known as HM --Mr. Milner is a great person who created ML (prototype of SML and OCaml) ――It's amazing that I was thinking about this 50 years ago (before the release of UNIX!) ――It is a kind of type system of "Lambda calculus with polymorphism". --Polyphase is a type that contains an unknown type such as List <T>. ――I think I've come to understand a little that HM is parametric polymorphism and not ad hoc polymorphism, and that type classes can be expressed by parametric polymorphism. --Object-oriented, small "inheritance" is not possible --With this type system, you can infer the appropriate type from a lambda expression alone, without any hints about the type. --The type inference algorithm is Algorithm W, which has been proven to always infer the appropriate type.

It's tough, so I searched for some Japanese articles.

-KLabgames Tech Blog-Mechanism of type inference

It was good to get an image of what I was doing.

-Programming Language Scala Japanese Information Site Chapter 16 Hindley / Milner Type Inference

It seems that this is the code of Dr. Odersky, the god of Scala, but it seems to be "for those who originally knew", and I could not understand what the text itself was saying ... [^ 3]

[^ 3]: There was a typographical error (missing) in the source code in the text. The original is Scala by example (PDF), and the source code here is I was fine.

Type inferencers are written in various languages, not just Scala, so I gradually understood them as I rushed.

-uehaj's blog --Hindley-I wrote a Milner type inference algorithm in Groovy [^ 4]

[^ 4]: Uehara-san, how are you? Thank you for your help during waba. 20 years ago ww

Based on these sources, I tried tracing the operation of the actual type inference process one by one by hand, and finally grasped the contents.

It's a big deal, so I ported it to the hot Kotlin now.

https://github.com/reki2000/hyndley-milner-kotlin/blob/master/HindleyMilner.kt

I had a lot of trouble due to lack of Kotlin power, but I managed to complete it! I added in the comments that I understood the role of each function.

Understand how the algorithm works

Preparation part

Definition of elements in the syntax tree

The syntax tree that is the target of this type inference consists of the following elements. I named it Term, but it could have been Syntax.

--Lamda defines the function --Id is a variable or literal --Apply is a function application (function call) --Let is a binding of a value to a variable (like the usual a = 5) --Letrec is a binding with recursion (refers to itself in the definition)

sealed class Term()
data class Lambda(val v: String, val body: Term) : Term()
data class Id(val name: String) : Term()
data class Apply(val fn: Term, val arg: Term) : Term()
data class Let(val v: String, val defn: Term, val body: Term) : Term()
data class Letrec(val v: String, val defn: Term, val body: Term) : Term()

Converting code written in a human-friendly language to this syntax tree is a simple replacement, but a bit tedious.

For example, replace the following expression with a syntax tree.

if (n == 0) 1 else (n - 1)

If you put together the numerical calculation part, you will go to another world, so we will introduce a function called zero that judges zero and prev that subtracts one.

if (zero(n)) 1 else prev(n)

I will write it down like LISP to clarify the tree structure.

(lambda n (if (zero n) 1 (prev n))) 

In this syntax tree, the function can only take one argument, so curry it all.

(lambda n (((if (zero n)) 1) (prev n)))

Except for lambda, all parentheses should be a sequence of only two elements of the function-argument.

At this point, you can convert it to this syntax tree as it is. Set all parentheses to Apply and all literals to Id.

Lambda("n", 
   Apply(Apply(Apply(Id("if"), 
       Apply(Id("zero"),Id("n"))),
       Id("1")),
       Apply(Id("prev"),Id("n")))))

If you create a parser, you can even convert the original description into such a form, but this time you will write such a syntax tree yourself and pass it to the inferencer.

Type representation

Next is the definition of a group of classes (confusing expressions ...) that represent types.

--Type is a class for representing the entire type --TypeVariable is a type variable whose specific type is unknown. The inference result is stored in instance. It has a mechanism to give an appropriate name such as a or b for displaying the variable name. --TypeOperator is a type consisting of one or more types --Function: A type of TypeOperator, a functional type. Have from and to --INTEGER, BOOLEAN: Needless to say, numeric type and true / false type --UNKNOWN: A marker that indicates that the specific type is not yet known. Set to TypeVariable instance


//Definition of elements related to types (type variables, type operators)
sealed class Type()
var nextVariableName: Char = 'a'
class TypeVariable() : Type() {
    override fun toString(): String {
        if (name == "") {
            name = "${nextVariableName++}" //A device that does not wastefully consume the alphabet when displaying variable names
        }
        return name
    }
    var name: String = ""
    var instance: Type = UNKNOWN
}

open class TypeOperator(val name: String, val types: List<Type>) : Type() {
    override fun toString() = name + " " + types.map { it.toString() }.joinToString(" ")
}

//Define some specific types
class Function(name: String, types: List<Type>) : TypeOperator(name, types) {
    constructor(fromType: Type, toType: Type) :  this("->", listOf(fromType, toType))
    override fun toString() = types[0].toString() + "->" + types[1].toString()
}

val INTEGER = TypeOperator("int",  emptyList())
val BOOLEAN = TypeOperator("bool", emptyList())
val UNKNOWN = TypeOperator("unknown", emptyList())

Reasoning part

analyse (): Control part of inference flow

The function name (analyse) is probably French.

Inference processing is recommended according to the elements of the syntax tree.

The environment (env) is a Map to the Type of the Id defined up to that point.

--Id is the type of its contents if its name is registered in the environment --Function does not infer and returns the result "(argument type)-> (return value of function body)" --Let also does not infer, adds the bound variable to the environment and returns the result of evaluating the main body part as it is --Letrec is similar to Let, but you need to infer the type of the variable before evaluating the body part. (Slightly insufficient understanding here)

An important part of inference is the process of Apply.

The next unify sets the inference result to the instance of (result type).

fun analyse(node: Term, env: Map<String, Type>, nonGeneric: Set<Type> = emptySet()): Type {
    return when (node) {
        is Id -> getType(node.name, env, nonGeneric)
        is Apply -> {
            val funType = analyse(node.fn, env, nonGeneric)
            val argType = analyse(node.arg, env, nonGeneric)
            val resultType = TypeVariable()
            unify(Function(argType, resultType), funType)
            resultType
        }
        is Lambda -> {
            val argType = TypeVariable()
            val resultType = analyse(node.body, env + (node.v to argType), nonGeneric + argType)
            Function(argType, resultType)
        }
        is Let -> {
            val defnType = analyse(node.defn, env, nonGeneric)
            analyse(node.body, env + (node.v to defnType), nonGeneric)
        }
        is Letrec -> {
            val newType = TypeVariable()
            val newEnv = env + (node.v to newType)
            val defnType = analyse(node.defn, newEnv, nonGeneric + newType)
            unify(newType, defnType)
            analyse(node.body, newEnv, nonGeneric)
        }
    }
}

unify (): Core part of inference

It's the body part, skipping some auxiliary processing (I'm still not sure why fresh is needed).

Under the condition that the right side t1 and the left side t2 have the same type, if an unknown type variable appears in t1 and t2, the type of the opposite side is assigned.

prune is a process that repeats the process of replacing a type variable with the type of its instance when it appears, until something that cannot be inferred yet appears.

fun unify(t1: Type, t2: Type) {
    val a = prune(t1)
    val b = prune(t2)
    if (a is TypeVariable) {
        if (a != b) {
            if (occursInType(a, b))
            	throw Exception("recursive unification")
            else
            	a.instance = b
        }
    } else if (a is TypeOperator && b is TypeVariable) {
        unify(b, a)
    } else if (a is TypeOperator && b is TypeOperator) {
        if (a.name != b.name || a.types.count() != b.types.count()) {
            throw Exception("Type mismatch ${a} != ${b}")
        }
        a.types.zip(b.types).forEach { (p,q) -> unify(p, q) }
    }
}

Follow the actual operation

Let's infer the type of fn n-> if zero (n) 1 else prev (n) of the previous function.

The LISP-like notation is (lambda n (((if (zero n) 1) (prev n))).

The types of ʻif, zero, and prev` that appear in it need to be defined, so they are as follows.

if: bool -> x -> x -> x
zero: bool -> int
prev: int -> int

The type of this function, the type you want to infer in the end, is the unknown type a ?. Since this section has the function definition Lambda, we add the argument n as another unknown type b? To the environment ʻenv` to proceed with body inference.

(lambda n ((if (zero n)) 1) (prev(n))) : a?
        n: b?

The top-level term for body is ʻApply. The function part is ((if (zero n)) 1)and the argument is(prev n). While setting the result type of this section to c?`, First proceed with the inference of the function part.

(lambda n ((if (zero n)) 1) (prev(n))) : a?
        n: b?
          ((if (zero n)) 1) (prev n)): c?

Next is also ʻApply, the function part is (if (zero n)) , and the argument is 1. We proceed with the inference of the function part, with the result type of this section being d?`.

(lambda n ((if (zero n)) 1) (prev n))) : a?
        n: b?
          ((if (zero n)) 1) (prev n)): c?
           (if (zero n)) 1): d?

Next is ʻApply, the function part is ʻif, and the argument is(zero n). Now that we know the type of the function part, we proceed to evaluate the argument (zero n). Oops, leave the result in this section as ʻe? `.

(lambda n (((if (zero n) 1) (prev(n))) : a?
        n: b?
          (((if (zero n)) 1) (prev n)): c?
           ((if (zero n)) 1): d?
            (if (zero n): e?

It is also ʻApply. The type of the function part zero is ʻint-> bool. The type of the argument n isb?. This is where ʻunify` comes into play.

If the result of this section is f?, then ʻint-> boolis equal tob?-> F?, So we know b: int, f: bool. From now on, the confirmed type will be written without ?`

(lambda n (((if (zero n) 1) (prev(n))) : a?
        n: b?
          (((if (zero n)) 1) (prev n)): c?
           ((if (zero n)) 1): d?
            (if (zero n)): e?
                (zero n): f?

                int -> bool === b? -> f?
                b: int
                n: int
                f: bool

Now that we know the argument type is bool, we go back to the previous(if (zero n)). Since the type bool-> x-> x-> x of ʻif and the type of the whole application this time, bool-> e? , Are equal, it is ʻe: x-> x-> x. I understand this.

(lambda n (((if (zero n) 1) (prev(n))) : a?
        n: int
          (((if (zero n)) 1) (prev n)): c?
           ((if (zero n)) 1): d?
            (if [bool]  ): e?

            bool -> x -> x -> x === bool -> e?
            e: x -> x -> x

Go back up one more ((if (zero n)) 1). The type of the argument 1 is ʻint, so x-> x-> x and ʻint-> d are equal. So you can see that x: int, d: int-> int

(lambda n (((if (zero n) 1) (prev(n))) : a?
        n: int
          (((if (zero n)) 1) (prev n)): c?
           ([x -> x -> x] int): d?

            x -> x -> x === int -> d?
            x: int
            d: x -> x 
            d: int -> int

I will return to the very beginning. The type g? Of the argument (prev n) had prev ʻint-> intandn already n: int. Therefore, the targets of unify are ʻint-> int and ʻint-> g. So g: int`

(lambda n (((if (zero n) 1) (prev(n))) : a?
        n: int
          ([int -> int]     (prev n)): c?
                            (prev n): g?
                         
                             int -> int == int -> g?
                             g: int

Finally you can see the type of body. ʻInt-> intis equal to ʻint-> c?. You can see that it is c: int.

(lambda n (((if (zero n) 1) (prev(n))) : a?
        n: int
          ([int -> int]     [int]): c?

            int -> int == int -> c?
            c: int 

The argument n was ʻint, so I finally found that the type of function I wanted to know was ʻint-> int!

(lambda [int]([int]) : a?

            a: int -> int

No, type inference is hard ...

Operation check

Let's try the code that we actually made.

fun factorial(n) = if (n == 0) 1 else n * factorial(n - 1)
factorial(5)

The result of making an inference by creating a letrec syntax tree equivalent to ...

int

He inferred that it was an int! great

Lambda("n", Lambda("m", Id("5"))

Is `ʻa-> b-> int``! Oh

from now on

I was wondering what happened to inference in a type system that includes inheritance, and I was thinking about studying a little more, but in a timely manner [What is level-based polymorphic inference that is also used in OCaml]( The article http://rhysd.hatenablog.com/entry/2017/12/16/002048) has been publishedLanguage implementation Advent Calendar I didn't know there was anything! !!

Summary

--JEP 286 Fun --Kotlin Fun to write ――Type inference is difficult. Thanks to the compiler and IDE --If you are interested in type inference, you should start with Algorithm W of HM inference device. --The story of JEP 286 is too derailed

Reference article

JEP 286: Local-Variable Type Inference

First Contact With ‘var’ In Java 10

Recommended Posts

I was supposed to be looking at JEP 286, but before I knew it, I was implementing Hindley-Milner type inference in Kotlin.
I want to be eventually even in kotlin
I tried running gRPC's Quick Start (Kotlin version), but it was difficult to read the Gradle script.