[PYTHON] Let's calculate Godel's β function with SymPy

Let's calculate the β function (Gödel's β function) with SymPy. At that time, try to reduce the number of Godel numbers in the result as much as possible.

In addition, since this article also serves as a practice for the author's NumPy array, there are places where fashionable vector calculations are forcibly performed.

References

--Shin Hayashi, Mitsuko Yasugi Translated and explained "Godel's Incompleteness Theorem" Iwanami Bunko (Blue 944-1), 2006. -(Original) K. Gödel. On the formally undecidable propositions of Principia Mathematica and related systems I, Monatshefte für Mathematik und Physik (38), pp. 173-198, 1931. --Shoji Maehara "Introduction to Basic Mathematics" Basic Mathematics Series 26, Asakura Shoten, 2000.

What is Godel's β function?

A function used for decoding when encoding a sequence of natural numbers into a natural number. It is called because it appears in the proof of Godel's incompleteness theorem (1931).

Assume the following:

-** Natural numbers are ** including 0. --Sequence subscripts shall start with ** 0 **. Same as Godel's paper.

Function $ \ beta $ for natural numbers $ l, m, i $ ($ m> 0,0 \ le i \ le n-1 $)

\beta(l,m,i) = \text{rem}(l,(i+1)m + 1)

Defined in (rem is a normal modulo operation, typically mod or% is used) [^ 1].

[^ 1]: $ im + 1 $ is positive, so rem can be calculated normally.

At this time, for a concrete sequence of natural numbers $ a = \ langle a_0, \ dots, a_ {n-1} \ rangle $ of length $ n \ ge 1 $, a certain natural number $ l, m You can calculate $ so that for $ 0 \ le i \ le n-1 $, $ \ beta (l, m, i) = a_i $ [^ 2]. That is, you can think of $ (l, m) $ as the natural number representation of the sequence (that is, ** Godel number **). There is also a way to encode a pair of natural numbers with a single natural number (see supplement), which results in the natural number sequence being encoded as a natural number. In the following discussion, we will proceed without specifying the encoding / decoding of the pair.

[^ 2]: I'm not saying anything about $ i $ larger than this.

Since the Chinese Remainder Theorem (CRT) is required to calculate $ l $, we use SymPy, a Python library with the function crt.

Since we need a concrete example, consider an example of encoding an array of length n = 5 ʻa = [31, 5, 51, 0, 2]`.

>>> import numpy as np
>>> from sympy.ntheory.modular import crt

>>> a = [31, 5, 51, 0, 2]
>>> a
[31, 5, 51, 0, 2]
>>> n = len(a)
>>> n
5

It's convenient, so let's convert it to the NumPy array ʻa_. Also, make an array of ʻa_ subscripts.

>>> a_ = np.array(a)
>>> a_
array([31,  5, 51,  0,  2])
>>> a_idx = np.arange(n)
>>> a_idx
array([0, 1, 2, 3, 4])

Encoding calculation

After giving a policy on the calculation for coding, let's actually calculate with Python. Given the sequence as above, the coding policy is as follows.

--Calculate $ m $ in some way so that each $ m_i $ is relatively prime (+ satisfying certain conditions). --Calculate $ l $ with the Chinese Remainder Theorem, using the fact that each $ m_i $ is relatively prime.

In general, factorial is used to calculate $ m $. However, this method makes $ m $ very large. Then $ l $ tends to be large. In this article, I will try to calculate by making $ m $ and eventually $ l $ as small as possible. However, I'm not sure if this is the minimum.

Hereinafter, it is expressed as $ m_i = (i + 1) m + 1 $. In other words, the definition of the β function is $ \ beta (l, m, i) = \ text {rem} (l, m_i) $.

Ask for m

The Wikipedia article "Gödel numbering for sequences" is a bit confusing, but it explains the weakest possible conditions that $ m $ should meet and is helpful (not sure if it's the weakest or not). According to it, if $ m $ meets the following conditions, it can be adopted for correct coding.

i\mid m \quad (1\le i\le n-1) \\
a_i < m_i \quad (0\le i \le n-1)

The first equation means that $ m $ is a common multiple ** of ** $ 1, \ dots, n-1 $ **. Therefore, it is recommended to list ** least common multiples (LCM) of $ m_0 $ of $ 1, \ dots, n-1 $ as candidates for $ m $, and find the smallest one that satisfies the second equation. Probably. Let's try it.

Find m0

NumPy has a two-argument function lcm. Since this is a so-called ufunc, you can calculate the LCM for an array by using the reduce method. I'll try. Since we created the subscript array $ [0,1, \ dots, n-1] $ earlier, we will use $ [1, \ dots, n-1] $ as the slice.

>>> m0 = np.lcm.reduce(a_idx[1:])
>>> m0
12

A manual calculation of the $ 1,2,3,4 $ LCM is certainly $ 12 $. This multiple $ m $ of $ m_0 $ always satisfies the first equation.

Generate $ m_i $

In the next section, we will calculate $ \ {m_i \} $ for various $ m $. Here's how to do it first.

For each $ m $, you need to find $ \ {m_i \} $ and compare it to $ \ {a_i \} $. The former is $ \ {m_i \ mid 0 \ le i \ le n-1 \} = \ {(i + 1) m +1 \ mid 0 \ le i \ le n-1 \} $ .. Since the range of $ i $ matches ʻa_idxdefined earlier, it can be calculated at once by array calculation. Define the function you want asm_i_seq`.

>>> def m_i_seq(m): return (a_idx + 1) * m + 1

If you try to calculate for m = 12, it will be like this.

>>> mis = m_i_seq(12)
>>> mis
array([13, 25, 37, 49, 61])

(mis is the image of $ m_i $ s)

Find a m that meets the conditions

Now let's find $ m $ (a multiple of $ m_0 $) that satisfies the second equation of the condition. First, $ \ {m_i \} $ for the case of $ m = m_0 = 12 $ is mis = [13 25 37 49 61] as calculated earlier. The comparison between this and $ \ {a_i \} $ can also be calculated at once with the vector calculation function of the NumPy array.

>>> a_ < mis
array([False,  True, False,  True,  True])

You can see that this does not hold for the two $ i $.

Next, let's double $ m_0 $ and set $ m = 24 $. At this time,

>>> mis = m_i_seq(24)
>>> mis
array([ 25,  49,  73,  97, 121])
>>> a_ < mis
array([False,  True,  True,  True,  True])

It hasn't been established yet. This is what happens when $ m = 36 $.

>>> mis
array([ 37,  73, 109, 145, 181])
>>> a_ < mis
array([ True,  True,  True,  True,  True])

It's all OK because it's all True. Adopt this $ m = 36 $.

(Optional) Check the coprimality of $ m_i $

By deciding $ m $ as described above, we can prove that the two different elements of $ \ {m_i \} $ are relatively prime. I will omit it here, but I will use the first equation of the above conditions.

In this article, we will check this by calculation. Since NumPy's gcd is ufunc, it has a ʻouter` method that applies a function to the Cartesian product. Now you can see if the greatest common divisor is equal to 1.

>>> np.gcd.outer(mis, mis)
array([[ 37,   1,   1,   1,   1],
       [  1,  73,   1,   1,   1],
       [  1,   1, 109,   1,   1],
       [  1,   1,   1, 145,   1],
       [  1,   1,   1,   1, 181]])

It was confirmed that it was 1 excluding the diagonal component (the diagonal component is gcd (x, x), so the result is x) [^ 3].

[^ 3]: If you want to make a mechanical judgment, you can use combinations of ʻitertools` and calculate each GCD for different combinations of $ m_i $ $ (m_i, m_j) $. ..

Calculate $ l $

After that, the Chinese Remainder Theorem can be applied normally. Since each $ m_i $ is relatively prime, a congruence system

l\equiv a_i \pmod {m_i}

Has a unique solution $ l $ modulo $ m_0 \ dots m_ {i-1} $. Solve using crt. It seems that you cannot specify a NumPy array for the argument of crt, so specify the one returned / converted to List.

>>> l, M = crt(mis.tolist(), a)
>>> l, M
(mpz(2496662055), 7726764205)

I got l. This is a multiple precision integer by gmpy2.

Note that M is the value of $ m_0 \ dots m_ {i-1} $. I will confirm it.

>>> np.multiply.reduce(mis)
7726764205

Finally, the calculated $ l $ satisfies $ l \ equiv a_i \ pmod {m_i} $, but since it is set to $ a_i <m_i $, $ a_i = \ text {rem} ( l, m_i) $ holds.

We now have $ (l, m) = (2496662055, 36) $ that meets the subject. (Note that $ \ {m_i \} $ is $ [37, 73, 109, 145, 181] $.)

(Optional) Confirm that it can be decrypted

Make sure that the remainder of actually dividing $ l $ by each $ m_i $ is $ a_i $.

np.mod is ufunc, so you can just specify an array or scalar and it will do the calculation for you.

>>> decoded = np.mod(l, mis)
>>> decoded 
array([mpz(31), mpz(5), mpz(51), mpz(0), mpz(2)], dtype=object)
>>> a_
array([31,  5, 51,  0,  2])

You can now confirm that the decryption result matches the original ʻa`.

You can see that the decryption result is wrapped in the class mpz of gmpy2. You shouldn't worry about this and can use it as a normal ʻint` value in subsequent calculations. I think you can also remove the trumpet class (unconfirmed).


>>> decoded[1] * 6
mpz(30)

Packaging

You can package it as if it were an array by implementing a special Python method. (Code is this time)

Summary

That's why I introduced how to express a sequence by β function. With Python's support for multiple-precision integers and number-theoretic functions, you should be able to actually compute the encoding even for realistic (larger) length sequences. In addition, NumPy vector operations enable efficient calculations. The actual calculations should help in a more practical understanding of abstract theory. see you.

Supplement

Supplement 1: A little more about β functions

There is a simpler way to encode. For example, as it appears in the first half of Godel's paper

\langle a_0, \dots, a_{n-1}\rangle \mapsto 2^{a_0}\cdot 3^{a_1}\cdot\dots \cdot p_{n-1}^{a_{n-1}}

There is also a method of. The feature of the β function method is that ** decoding can be done only by modulo operation **. Gödel showed that the undecidable formula presented in the first half of the paper can also be expressed as ** arithmetic formula ** (first-order predicate formula with only $ +, \ cdot, = $). It was.

Supplement 2: Length of sequence

When using the β function, the length of the original sequence cannot be determined by looking at the encoded value. Godel's proof didn't need to be able to decrypt the length of the sequence, but if length is needed

--When encoding: The sequence $ a'= \ langle n, a_0, \ dots, a_ {n-1} \ rangle $ with the length $ n $ of $ a $ inserted at the beginning of the sequence $ a $ is coded. To become. --When decrypting: Decrypt the first element of the sequence to get $ n $. Decrypt $ a'[i] $ for $ 1 \ le i \ le n $ and make it $ a [i-1] $.

It can be processed correctly by.

Supplement 3: Pair coding

I think there are various methods.

--How to use bijection (arranging numbers in steps), which often appears in high school mathematics --How to use two versions of the congruence formula in the Chinese Remainder Theorem

There are some, but here I will introduce the following method as a method that gives up bijection but is easy to decrypt (There is a way to put $ -1 $ on the right side due to technical requirements). Reference: https://slideplayer.com/slide/16991262/

(x,y) \mapsto 2^x(2y+1)

$ 2 ^ x $ is even and $ 2y + 1 $ is odd. Therefore, it can be decrypted by dividing the encoded value by 2 as much as possible.

Recommended Posts

Let's calculate Godel's β function with SymPy
Let's solve simultaneous linear equations with Python sympy!
[Piyopiyokai # 1] Let's play with Lambda: Creating a Lambda function
Overlay graphs with sympy
With Sympy, don't worry
Calculate tf-idf with scikit-learn
Let's prove the addition theorem of trigonometric functions by replacing the function with a function in SymPy (≠ substitution)