Overview

A "numerical" global optimization method using Z3 Prover is shown.

What is Z3 Prover?

Roughly, it is an MIT licensed theorem proving device made by Microsoft Research.

https://github.com/Z3Prover/z3

Z3 is a theorem prover from Microsoft Research. It is licensed under the MIT license.

If you are not familiar with Z3, you can start here.

Z3 can be built using Visual Studio, a Makefile or using CMake. It provides bindings for several programming languages.

See the release notes for notes on various stable releases of Z3.

Z3 charm

Even if I write so far, I don't think I can understand the meaning of Z3, so I would like to give an example.

code

from z3 import *

x = Real("x")
y = Real("y")
c = Real("c")

s = Solver()

s.add(
    x > 0,
    y > 0,
    c > 0,
    c * c == 2.0,
    x*x + y*y == 1,
    y == -x + c
)

print s.check()
print s.model()

Output result

$ python z3_test.py 
> sat
> [x = 0.7071067811?, c = 1.4142135623?, y = 0.7071067811?]

The word "sat" came out. From this, we can see that this constraint formula is satisfiable. Then, the combination of variables that satisfies this constraint is output.

The simplest example is c. c is

c > 0
c * c == 2.0

Solve the constraint of. is what it reads. From here we can see that c = √2. The value of s.model () is also 1.4142 ..., which shows that it is correct. It is faster to see the figure for the relationship between x and y.

This is the problem of finding the intersection of the unit circle and y = -x + c. The answer is very simple, and you can see the answer as x = y = √2 / 2. (You can see from the output result that it is also numerically correct.) In general, it is necessary for humans to think about the solution by substituting y = -x + c for x * x + y * y = 1 and so on. There is no need.

From the above, we can see that this Z3 has three wonderful points.

--Can handle decimal numbers --Can handle multiplication and division between variables --You don't have to solve mathematical formulas, just write constraints

With just this, you can do most of the things.

Global optimization procedure with Z3

Basic thought

Z3 itself does not have the ability to optimize values. Whether or not the given logical formula is satisfied. What value does it take when it is satisfied? I only know that. Therefore, we focus on the error between the model and the data. How far can the error be minimized? I will solve the proposition. As a result, the maximum error that the proposition cannot be solved. Find the value of the minimum error that can solve the proposition by binary search. This makes it possible to achieve global optimization with Z3. In the following example, we will explain using a model that returns 2x + 3 with noise by ax + b.

Test data generation script (data.py)

import sys
from random import gauss,seed

seed(0)
n = int(sys.argv[1])

for i in range(n):
    print i,gauss(2.0 * i + 3.0, 0.1)

Execution procedure

$python data.py 50 > data.csv

Global optimization by residual

Prepare the data
Define the model formula and data residuals
Define the sum of squares of the residuals (delta)
Check the model
Let max_delta be the delta obtained in 4.
Set min_delta = 0.0
tmp_delta = (max_delta + min_delta) / 2
Define delta <max_delta
If the model can be checked, set max_delta = tmp_delta.
If the model cannot be checked, cancel the definition in 8. and set min_delta = tmp_delta.
max_delta --min_delta <End when the desired accuracy is reached. If not, jump to 8.

from z3 import *

if __name__ == "__main__":
    data = map(lambda x:map(float,x.split()),open("data.csv")) # 1.

    s = Solver()
    a = Real("a")
    b = Real("b")
    delta = Real("delta")

    deltas = []
    for i,(x,y) in enumerate(data):
        d = Real("d%d"%i)
        deltas.append(d)
        s.add(
            d == y - ( a * x + b )  # 2.
        )
        s.check()

    s.add(delta == sum(d * d for d in deltas)) # 3.
    
    print s.check() # 4.
    print s.model()

    max_delta = float(s.model()[delta].as_decimal(15)[:-1]) # 5.
    min_delta = 0                                           # 6.

    while 1:
        tmp_delta = (max_delta + min_delta) / 2.0           #7.
        s.push()

        s.add( delta < tmp_delta )   # 8.

        if s.check() == sat: # 9.
            print "sat:",tmp_delta,min_delta,max_delta
            s.push()
            max_delta = tmp_delta
        else: # 10.
            print "unsat:",tmp_delta,min_delta,max_delta
            s.pop()
            min_delta = tmp_delta

        if max_delta - min_delta < 1.0e-6: # 11.
            break

    print s.check()
    model = s.model()
    
    print delta,model[delta].as_decimal(15)
    print a,model[a].as_decimal(15)
    print b,model[b].as_decimal(15)

result

delta 0.604337347396090?
a 2.001667022705078?
b 2.963314133483435?

You can see that numbers close to a = 2.0 and b = 3.0 are required.

Global optimization for the border area of individual cases

Prepare the data
Define epsilon> 0
Define y --epsilon <= f (x) <= y + epsilon, where the control variable (x), objective variable (y), and model are y = f (x), depending on the case of each data. ..
Check the model
Let max_epsilon be the epsilon obtained in 4.
Set min_epsilon = 0.0
tmp_epsilon = (max_epsilon + min_epsilon) / 2
Define epsilon <max_epsilon
If the model can be checked, set max_epsilon = tmp_epsilon.
If the model cannot be checked, cancel the definition in 8. and set min_epsilon = tmp_epsilon.
max_epsilon --min_epsilon <End when the desired accuracy is reached. If not, jump to 8.

from z3 import *

if __name__ == "__main__":
    data = map(lambda x:map(float,x.split()),open("data.csv")) # 1.

    s = Solver()
    a = Real("a")
    b = Real("b")
    epsilon = Real("epsilon")

    s.add(epsilon >= 0.0) # 2.

    for i,(x,y) in enumerate(data):
        s.add(
            y - epsilon <=  ( a * x + b ) , (a*x+b) <= y + epsilon # 3.
        )
        s.check() 

    print s.check() # 4.
    print s.model()

    max_epsilon = float(s.model()[epsilon].as_decimal(15)[:-1]) # 5.
    min_epsilon = 0 # 6.

    while 1:
        tmp_epsilon = (max_epsilon + min_epsilon) / 2.0 # 7.
        s.push()

        s.add( epsilon < tmp_epsilon ) # 8.

        if s.check() == sat: # 9.
            print "sat:",tmp_epsilon,min_epsilon,max_epsilon
            s.push()
            max_epsilon = tmp_epsilon
        else: # 10.
            print "unsat:",tmp_epsilon,min_epsilon,max_epsilon
            s.pop()
            min_epsilon = tmp_epsilon

        if max_epsilon - min_epsilon < 1.0e-6: # 11.
            break

    print s.check()
    model = s.model()
    
    print epsilon,model[epsilon].as_decimal(15)
    print a,model[a].as_decimal(15)
    print b,model[b].as_decimal(15)

result

epsilon 0.223683885406060?
a 2.000628752115151?
b 3.006668013951515?

You can see that numbers close to a = 2.0 and b = 3.0 are also required here.

Which method should be used

The latter is "global optimization for the boundary region of individual cases". One is speed. The former takes a long time to check once. This is probably the internal implementation of Z3, so I don't know the details, but it seems that it takes time because the relationship between delta and x, y has a strong non-linearity. On the other hand, epsilon and x, y have a weak non-linearity, so it will not take much time. The second is the ease of judging the rationality of the model. This is because by adding a constraint like "epsilon <specified accuracy" first, when the model does not meet the specified accuracy, it becomes unsat, so the detection of a bad model becomes faster. In addition, there are other merits of placing restrictions first, and even if new data is added, the accuracy specified here is guaranteed, so it is very easy to use. The existing method can be used for "global optimization by residuals". It is an explanation of the degree. Personally, the former was only easy for humans to organize mathematical formulas, and I think the latter method is better for processing with Z3.

The discussion of global optimization

This method is not a global optimization. You might think that. Half Yes, half No. It depends on the position, either the computer position or the mathematics position. I think there are probably two claims. One is the point of error. For example, there is the number √2. This is expressed mathematically, but it cannot be expressed on a general computer. Computers can only handle rational numbers. Therefore, there is usually no way to handle the number √2 perfectly. Therefore, in 64-bit floating point, √2 is approximated and handled within an error range of about 1e-15. Therefore, there is no mathematically exact √2 on the computer. However, this is a story that I don't care about personally. No matter how much you manually organize the formulas to the limit and make them easy to read, you cannot use them programmatically unless you reduce them to numerical values. Therefore, the error between rational numbers and irrational numbers, or numerical values expressed mathematically and numerical values expressed on a computer must be allowed programmatically (should be set up to be allowed). Until now, I have no choice but to tolerate it, or until I was told, I didn't realize that it was tolerated, so I don't think there is any practical problem. The second is not official. about it. If you look at the title, you probably think of something like a formula for the solution of a quadratic equation. A limited solution can be found in a limited range of numerical values on a limited mathematical formula. I think that is the general solution formula. It's more like an algorithm, it's more of a procedure than a mathematical formula. Of course, it was created on the assumption that the implementation of Z3 Prover is correct, so I'm not sure if this procedure will always lead to the correct value. However, I think it makes sense to be able to actually present the optimal solution with some examples. Perhaps this method has many unnatural points mathematically, but it seems that there is no particular problem in practical use from a computer point of view. Therefore, it is expressed here as "numerical" global optimization, not "mathematical" global optimization.

Attractiveness of global optimization by Z3

As you can see from the code, it can be optimized just by defining the formula. I think that the difficult points in machine learning are "formula formula" and "mathematical analysis". The former can be done fairly smoothly when given a problem. However, "analysis of mathematical formulas" is also required as a set. How to calculate the gradient of the objective function ... Because the gradient cannot be calculated, it is necessary to change the shape of the objective function ... etc., And it is necessary to transform the muddy mathematical formula itself, which has been a painful problem for many years. However, since I came up with the optimization method for Z3 Prover, I wonder if that is the time. I feel like that.

Difference from computer algebra system

Famous computer algebra systems include Mathematica, Maple, Maxima for OSS, and PySim and Sage for minors. I used Maxima myself, but I wasn't personally satisfied. In particular, Maxima does not return an answer with an error after waiting for 3 days if the handling of inequalities is weak or if too complicated expressions are entered. There was such a thing. Also, even if an answer was returned, Maxima sometimes tried to handle it with mathematical formulas and returned an incomprehensible answer. As a programmer, I am not interested in the truth of mathematics, and I am often satisfied if only the numerical solution of a given mathematical formula is given. In that sense, even if you enter a complicated formula, you can return the answer, and Z3 with Python binding is quite grateful. (I don't want to write any process to parse Maxima formulas and convert them into programs)

Challenges of global optimization with Z3

It's slow anyway. I've tried it several times, but it can be tough. I have used this Z3 to optimize all the variables of the neural network, but sometimes it did not return even after 3 days. In addition, it seems that it sometimes breaks the limit of decimal representation, in which case it may die with an exception. Therefore, even though it is possible to perform non-linear global optimization, it is often impossible to even check the model if it is done too much. Although it may be in the development stage like this, I think that it is a very attractive method as a method that puts mathematical formulas into the program as it is and optimizes the whole area without permission.

[PYTHON] General-purpose global optimization with Z3