0. Introduction

AtCoder Official Algorithm Collection AtCoder Library (** ACL **) released on September 7, 2020 it was done. I thought it was a good opportunity because most of the algorithms included in ACL were new to me, so I did everything from studying algorithms to implementing them in Python.

In this article we will look at ** internal_math **.

internal_math is an assortment of number-theoretic algorithms used inside ACLs and has the following contents.

name	Overview
safe_mod	integerx の正integermResidue by(x \% m).. However$0 \leq x % m < m $Meet.
barrett	High-speed modulo operation.
pow_mod	x^n \pmod{m}Calculation.
is_prime	High-speed primality test.
inv_gcd	integera と正integerbGreatest common divisorgandxa \equiv g \pmod{b}BecomesxCalculation. However$0 \leq x < \frac{b}{g} $Meet.
primitive_root	mPrimitive root modulo.

Of these, in this article

safe_mod
pow_mod
inv_gcd
barrett

Handles. I will not touch on the constexpr (constant expression) itself.

Target readers

If you look at the ACL code but ** don't know what you're doing **.
Those who do not understand C ++ and want to read it in Python.

Untargeted readers

Those who want Python-optimized code for ACLs. → Since it was implemented so that it would be the same as ACL as much as possible, the execution speed in Python is not considered at all. There seems to be a movement to make it available directly from Cython, so it may be good to follow that.

Referenced

C ++ documentation.

[C ++ reference] cppreference

This is an article by @drken that summarizes how to ask too much in competitive programming. It's very easy to understand.

[Overall special feature on how to find too much] qiita_mod

I used it as a reference to describe the Euclidean algorithm in mathematical formulas.

[Wikipedia-Euclidean algorithm] wiki_euclid

This is a reference article for Barrett reduction.

Wikipedia-Barrett reduction

safe_mod Consider the remainder $ x % m $ of the positive integer $ m $ of the integer $ x $.

1.1. Modulo operation in C ++

In C ++ it looks like this:

#include <bits/stdc++.h>
using namespace std;
int main(){
    cout << " 7 % 3 = " << 7 % 3 << endl;  // 7 % 3 = 1
    cout << "-7 % 3 = " << -7 % 3 << endl;  // -7 % 3 = -1
    
    return 0;
}

The second thing to look at is that when $ x $ is negative, the remainder is also negative. this is

The quotient is rounded towards 0
$ (x / m) * m + x % m == x $ is satisfied

It is due to. ([Reference] cppreference)

1.2. Implementation

safe_mod is a function that puts the value of the remainder $ x % m $ in $ 0 \ leq x % m <m $. This can be achieved by adding $ m $ if the result of the modulo operation is negative. The implementation in Python is as follows.

def safe_mod(x, m):
    x %= m
    if x < 0: x += m
    return x

Note that Python does not require this implementation because the remainder of the positive integer $ m $ is guaranteed to be $ 0 \ leq x % m <m $. (This is because the integer division "//" in Python is rounded towards negative infinity.)

# safe_Modulo arithmetic
print(f'safe_mod(7, 3) = {safe_mod(7, 3)}')  # safe_mod(7, 3) = 1
print(f'safe_mod(-7, 3) = {safe_mod(-7, 3)}')  # safe_mod(-7, 3) = 2

#Arithmetic operator%Modulo operation by
print(f' 7 % 3 =  {7 % 3}')  # 7 % 3 = 1
print(f'-7 % 3 =  {-7 % 3}')  # -7 % 3 = 2

pow_mod Consider the calculation of $ x ^ n \ pmod {m} $ for the integer $ x $ and the natural numbers $ n, m $.

2.1. Naive method

When the exponent $ n $ is a natural number, $ x ^ n $ is $ x $ multiplied by $ n $, so if it is implemented as it is, it will be as follows.

def naive_pow_mod(x, n, m):
    r = 1  #Variable to put the result
    for _ in range(n):
        r *= x
    r %= m  #Find too much divided by m
    return r


x = 3
n = 4
m = 5
print(naive_pow_mod(x, n, m))  # 1

But what if $ n $ is around $ 10 ^ 9 $? There are two possible problems with the above method.

Requires about $ 10 ^ 9 $ multiplication
The value becomes very large in the process of calculation

These require a very long calculation time. And by solving these problems, you can implement ** pow_mod ** which calculates $ x ^ n \ pmod {m} $ at high speed.

2.2. Solution to Problem 1: Iterative Squares

** Iterative squares ** (also known as binary) repeats squares, as the name implies. This allows you to calculate large indices with a small number of calculations.

Square $ x $ to get $ x ^ 2 $.
Square $ x ^ 2 $ to get ** $ x ^ 4 $ without calculating ** $ x ^ 3 $.
Square $ x ^ 4 $ to get ** $ x ^ 8 $ without calculating ** $ x ^ 5, x ^ 6, x ^ 7 $.

\cdots

It is like that. As a concrete example, let's look at the case of $ n = 50 . Now that we have the results when the exponent is a power of 2 ( x, x ^ 2, x ^ 4, \ cdots $), consider expressing $ x ^ {50} $ with these.

50=32 + 16 + 2

x^{50}=x^{32}\cdot x^{16}\cdot x^{2}

I will call. So $ x ^ {50} $ is

5 times before finding $ x ^ {32} $
Multiply them 3 times (when 1 is prepared as the result value)

** It is calculated by multiplying 8 times in total. ** ** It is effective to utilize binary numbers to perform these procedures mechanically. It is natural from the definition of binary numbers, but the power of 2 (32, 16, 2 in the above example) that constitutes the integer $ n $ is a bit that is '1' when $ n $ is expressed in binary notation. It corresponds. Therefore, look at $ n $ from the lower bits and multiply by $ x ^ {(power of 2)} ; ; $ corresponding to the case of '1' to obtain the desired value. The implementation in Python is as follows.

#Exponentiation by iterative squares
def binary_pow_mod(x, n, m):
    r = 1  #Variable to put the result
    y = x  # x^(Power of 2)Variable to put
    while n:
        if n & 1: r = r * y  #Multiply if the least significant bit is 1
        y = y * y
        n >>= 1  #Shift right to see the next bit
    r %= m
    return r

2.3. Solution to Problem 2: The nature of modulo operations

Computers can calculate very fast (from a human perspective), and Python can handle integers as large as memory allows, but it still takes time to calculate large digits. For example, if you repeat $ 3 ^ {1000000000} $ using the square method, you will end up with

3^{536870912} \times 3^{463129088}

Will be calculated. If what you want is too much $ x ^ n $ divided by the natural number $ m $, you can take advantage of the modulo operation property to avoid such large number operations. The properties used are:

Multiplication (as long as you take the mod at the end) ** You can take the mod any number of times **

The remainder of dividing $ x $ by $ m $ always falls within the range of $ 0 \ leq x % m <m $, so by taking a mod every time you multiply, you always calculate between values less than $ m $. can do.

2.4. Implementation

pow_mod uses the properties of mod operation for iterative squares. This can be implemented with a slight modification to the binary_pow_mod implemented earlier.

def pow_mod(x, n, m):
    if m == 1: return 0  #The remainder after dividing by 1 is always 0
    r = 1
    y = x % m  #Divide by m too much
    while n:
        if n & 1: r = r * y % m  #Divide by m too much
        y = y * y % m  #Divide by m too much
        n >>= 1  
    return r

In Python, the built-in function pow () is equivalent to pow_mod, so this implementation is unnecessary.

#Implementation in a naive way
print(naive_pow_mod(3, 4, 5))  # 1
#print(naive_pow_mod(13, 1000000, 1000000007))  #it will not finish


#Implementation using the iterative square method
print(binary_pow_mod(13, 1000000, 1000000007))  #735092405 I can calculate this much
#print(binary_pow_mod(13, 1000000000, 1000000007))  #it will not finish


#Implementation using the properties of iterative squares + mod (ACL pow_Equivalent to mod)
print(pow_mod(13, 1000000000, 1000000007))  # 94858115


#Calculation using Python's built-in function pow
print(pow(13, 1000000000, 1000000007))  # 94858115

inv_gcd For the integer $ a $ and the positive integer $ b $

Greatest common divisor $ \ gcd (a, b) $
$ xa \ equiv \ gcd (a, b) \ pmod {b} $ $ x $

To calculate. However, $ x $ satisfies $ 0 \ leq x <\ frac {b} {\ gcd (a, b)} $.

3.1. Euclidean algorithm

Let's start with the greatest common divisor of $ a $ and $ b $. The algorithm for calculating the greatest common divisor is ** Euclidean algorithm **. The implementation in Python is as follows.

def gcd(a, b):
    if b == 0: return a
    return gcd(b, a % b)

From now on, it will be $ a> b $. Even if $ a <b $, $ 0 \ leq a % b <b $ is recursively called $ \ gcd (b, a % b) $ because the first argument is always larger, so there is no problem. ..

The greatest common divisor can be calculated by Euclidean algorithm for the following two reasons.

\gcd(a, b) = \gcd(b, a \% b)
$ a \ ne 0 $ for $ \ gcd (a, 0) = a $

The proof of 1. can be found in [@ drken's article] qiita_mod, so please refer to that. 2. is clear from the fact that $ a $ is a divisor of 0.

The claim of 1. is strong. Now $ a> b $ and $ b> a % b $, so according to 1.

** The problem of finding the greatest common divisor of $ a $ and $ b $ can be turned into the problem of finding the greatest common divisor of a smaller number **

It will be. Also, since it is $ a % b \ geq 0 $, by repeating 1., it will always be in the form of $ \ gcd (g, 0) $. And from 2., this $ g $ is the greatest common divisor of $ a $ and $ b $.

3.2. Describe the Euclidean algorithm with a mathematical formula

The Euclidean algorithm is expressed by a mathematical formula. When $ a = r_0, b = r_1 $ and the quotient of $ r_k $ divided by $ r_ {k + 1} $ is written as $ q_k $

\begin{aligned}
r_0 &= q_0r_1 + r_2\\
r_1 &= q_1r_2 + r_3\\
\cdots\\
r_{n-1} &= q_{n-1}r_n + 0
\end{aligned}

And the $ r_n $ thus obtained was the greatest common divisor of $ a $ and $ b $. Expressing the above equation using a matrix

\begin{aligned}
\begin{pmatrix}
r_0\\
r_1
\end{pmatrix}
&=
\begin{pmatrix}
q_0 & 1\\
1 & 0\\
\end{pmatrix}
\begin{pmatrix}
r_1\\
r_2
\end{pmatrix}\\
\begin{pmatrix}
r_1\\
r_2
\end{pmatrix}
&=
\begin{pmatrix}
q_1 & 1\\
1 & 0\\
\end{pmatrix}
\begin{pmatrix}
r_2\\
r_3
\end{pmatrix}\\
\cdots\\
\begin{pmatrix}
r_{n-1}\\
r_n
\end{pmatrix}
&=
\begin{pmatrix}
q_{n-1} & 1\\
1 & 0\\
\end{pmatrix}
\begin{pmatrix}
r_n\\
0
\end{pmatrix}
\end{aligned}

It will be. If you write these together

\begin{pmatrix}
r_0\\
r_1
\end{pmatrix}
=
\begin{pmatrix}
q_0 & 1\\
1 & 0\\
\end{pmatrix}
\begin{pmatrix}
q_1 & 1\\
1 & 0\\
\end{pmatrix}
\cdots
\begin{pmatrix}
q_{n-1} & 1\\
1 & 0\\
\end{pmatrix}
\begin{pmatrix}
r_n\\
0
\end{pmatrix}
\;\;\;\;\cdots(*)

It can be obtained. Now for $ i = 0, 1, \ cdots, n-1 $

Q_i=
\begin{pmatrix}
q_i & 1\\
1 & 0
\end{pmatrix}

And the determinant is

\begin{aligned}
\det(Q_i) &= q_i \cdot 0 - 1\cdot1\\
&=-1
\end{aligned}

So there is an inverse matrix $ Q_i ^ {-1} $

\begin{aligned}
Q_i^{-1} &= -
\begin{pmatrix}
0 & -1\\
-1 & q_i
\end{pmatrix}
= 
\begin{pmatrix}
0 & 1\\
1 & -q_i
\end{pmatrix}
\end{aligned}

is. Therefore the expression ($ * $) is

\begin{pmatrix}
\gcd(a, b)\\
0
\end{pmatrix}
=
\begin{pmatrix}
0 & 1\\
1 & -q_{n - 1}
\end{pmatrix}
\begin{pmatrix}
0 & 1\\
1 & -q_{n - 2}
\end{pmatrix}
\cdots
\begin{pmatrix}
0 & 1\\
1 & -q_{0}
\end{pmatrix}
\begin{pmatrix}
a\\
b
\end{pmatrix}
\;\;\;\;\cdots(**)

It will be.

3.3. Extended Euclidean algorithm

Now, the second thing you get with inv_gcd

$ xa \ equiv \ gcd (a, b) \ pmod {b} $ $ x $

Let's take a look. This uses the integer $ y $

ax + by = \gcd(a, b)

You can also write. Therefore, we can find $ x $ that satisfies this formula. By the way, in the formula $ (**) $ obtained in the previous section

\begin{pmatrix}
x & y\\
u & v
\end{pmatrix}
=
\begin{pmatrix}
0 & 1\\
1 & -q_{n - 1}
\end{pmatrix}
\begin{pmatrix}
0 & 1\\
1 & -q_{n - 2}
\end{pmatrix}
\cdots
\begin{pmatrix}
0 & 1\\
1 & -q_{0}
\end{pmatrix}

If you leave

\begin{pmatrix}
\gcd(a, b)\\
0
\end{pmatrix}
=
\begin{pmatrix}
x & y\\
u & v
\end{pmatrix}
\begin{pmatrix}
a\\
b
\end{pmatrix}

It will be. That is, this $ x, y $ is

ax + by = \gcd(a, b)

Meet. Therefore, by calculating $ \ gcd (a, b) $ using the formula $ (**) $, we get $ xa \ equiv \ gcd (a, b) \ pmod {b} $ in the process. You can see that we get x $. The algorithm for finding the integer $ (x, y) $ in this way is called the ** extended Euclidean algorithm **.

3.4. Preparation for implementation of inv_gcd ①

Euclidean algorithm is $ r_0 = a, r_1 = b $, and the procedure

\begin{pmatrix}
r_{i+1}\\
r_{i+2}
\end{pmatrix}
=
\begin{pmatrix}
0 & 1\\
1 & -q_{i}\\
\end{pmatrix}
\begin{pmatrix}
r_{i}\\
r_{i+1}
\end{pmatrix}

Was to repeat until $ r_ {i + 2} $ became $ 0 $. And

\begin{pmatrix}
x_i & y_i\\
u_i & v_i
\end{pmatrix}
=
\begin{pmatrix}
0 & 1\\
1 & -q_{i}
\end{pmatrix}
\begin{pmatrix}
0 & 1\\
1 & -q_{i-1}
\end{pmatrix}
\cdots
\begin{pmatrix}
0 & 1\\
1 & -q_{0}
\end{pmatrix}

The extended Euclidean algorithm is to find $ (x, y) $ that satisfies $ ax + by = \ gcd (a, b) $ by calculating at the same time.

Let's see the actual calculation process. Given $ a, b $, first check if $ a % b $ is $ 0 $. If it's $ 0 $, you don't have to go through any further steps

\begin{aligned}
\gcd(a, b) = b\\
x=0
\end{aligned}

You can see that. The initial state of each variable when $ a and b $ are given is as follows.

\begin{aligned}
r_0 = a, \;\;r_1&=b\\\\
\begin{pmatrix}
x_{-1} & y_{-1}\\
u_{-1} & v_{-1}
\end{pmatrix}
&=
\begin{pmatrix}
1 & 0\\
0 & 1
\end{pmatrix}
\end{aligned}

The initial state of $ (x, y, u, v) $ is the identity matrix when these variables are at $ i = 0 $

\begin{pmatrix}
x_0 & y_0\\
u_0 & v_0
\end{pmatrix}
=
\begin{pmatrix}
0 & 1\\
1 & -q_{0}
\end{pmatrix}

To meet. Next, let's look at the recurrence formulas that each variable follows. First, $ q_i $ can be obtained from $ r_ {i + 1}, r_i $.

q_i = \left\lfloor\frac{r_i}{r_{i+1}}\right\rfloor

You can then use this to see the transitions of other variables.

\begin{aligned}
r_{i+2} &= r_i - q_ir_{i+1}\\\\
\begin{pmatrix}
x_i & y_i\\
u_i & v_i
\end{pmatrix}
&=
\begin{pmatrix}
0 & 1\\
1 & -q_{i}
\end{pmatrix}
\begin{pmatrix}
x_{i-1} & y_{i-1}\\
u_{i-1} & v_{i-1}
\end{pmatrix}
\end{aligned}

The termination condition is $ r_ {i + 2} = 0 $. At this time

\begin{aligned}
r_{i+1} = \gcd(a, b)\\
ax_{i}+by_{i}=\gcd(a, b)
\end{aligned}

It will be.

3.5. Preparation for inv_gcd implementation ②

ax+by=\gcd(a, b)Since is an indefinite equation, there are innumerable solutions. Here is the solution obtained by the extended Euclidean algorithmxBut$ |x| < \frac{b}{\gcd(a, b)} $Indicates that the condition is met. To do this, we first show the following:

For $ i \ geq 0 $

r_{i+1}|u_i|+r_{i+2}|x_i|\leq b

Is established.

The recurrence formula of $ r_i, x_i, u_i $ is used.

\begin{aligned}
r_{i+2} &= r_i - q_ir_{i+1}\\
x_i &= u_{i-1}\\
u_i &= x_{i-1} - q_iu_{i-1}
\end{aligned}

And the nature of the absolute value

|x - y| \leq |x| + |y|

is. It is shown by mathematical induction. When $ i = 0 $

\begin{aligned}
r_{1}|x_0|+r_{2}|u_0|&=b|1|+r_2|0|\\
&=b\\
&\leq b
\end{aligned}

Fill with. When $ i = k $

r_{k+1}|u_k|+r_{k+2}|x_k|\leq b

Assuming that, when $ i = k + 1 $,

\begin{aligned}
&r_{k+2}|u_{k+1}|+r_{k+3}|x_{k+1}| \\
=\;& r_{k+2}|x_{k} - q_{k+1}u_{k}| + (r_{k+1} - q_{k+1}r_{k+2})|u_k|\\
\leq\;&r_{k+2}|x_k|+q_{k+1}r_{k+2}|u_k|+r_{k+1}|u_k|-q_{k+1}r_{k+2}|u_k|\\
=\;& r_{k+1}|u_k|+r_{k+2}|x_k|\\
\leq\;&b
\end{aligned}

I will meet next to you. From the above, for $ i \ geq 0 $,

r_{i+1}|u_i|+r_{i+2}|x_i|\leq b

Was shown to hold.

Using this result|x|Let's evaluate. Nowr_{n+2}=0Suppose it was. That is,

\begin{aligned}
r_{n+1}&=\gcd(a,b)\\
x_{n}a&\equiv\gcd(a, b)\pmod{b}
\end{aligned}

It will be. Consider the case of $ i = n-1 $ in the inequality shown earlier. Then

\begin{aligned}
(Left side) &= r_{n}|u_{n-1}|+r_{n+1}|x_{n-1}|\\
&\geq r_{n}|u_{n-1}|\\
&> r_{n+1}|u_{n-1}|\\
&= \gcd(a, b)|x_n|
\end{aligned}

And since $ \ gcd (a, b) \ ne b $ from $ a % b \ ne0 $, the $ x_n $ obtained by the extended Euclidean algorithm is

|x_n|< \frac{b}{\gcd(a, b)}

It was shown to meet. Also, if $ x_n <0 $

x = x_n + \frac{b}{\gcd(a, b)}

You can get $ x $ which is $ 0 \ leq x <\ frac {b} {\ gcd (a, b)} $. This $ x $ is

\begin{aligned}
xa &= (x_n + \frac{b}{\gcd(a, b)})a\\
&=x_na+\rm{lcm(a, b)}\\
&\equiv x_na \pmod{b}
\end{aligned}

It is a more reliable solution. Where $ \ rm {lcm} \ it {(a, b)} $ is the least common multiple of $ (a, b) $.

3.6. Preparation for implementation of inv_gcd ③

In [Preparation for implementation of inv_gcd ①](# 34-Preparation for implementation of inv_gcd), each variable is subscripted and each transition is followed like a sequence, but since it is not necessary to keep the old value in implementation, it is better. You can write compactly. From here, we will follow the variable names actually used in the ACL.

First, let's check the required variables. Since $ r_i $ is a trinomial recurrence formula, we need two variables. Let these be $ s and t $. We only need one $ q_i $, so let's call it $ u $. It is $ (x_i, y_i, u_i, v_i) $, but if you look at the recurrence formula, you can see that $ (x_i, u_i) $ and $ (y_i, v_i) $ are independent. Now that we want $ x $, we don't need to hold $ (y_i, v_i) $. So let $ (x_i, u_i) $ be $ (m_0, m_1) $.

As mentioned in preparation (1), when $ a and b $ are given, first check $ a % b $. If this is $ 0 $, you don't have to go through any further steps

\begin{aligned}
\gcd(a, b) = b\\
x=0
\end{aligned}

That's it. Therefore, we will look at the case of $ a % b \ ne 0 $ from now on. The initial state is

\begin{aligned}
s &= r_1 = b\\
t &= r_2 = r_0 - \left\lfloor\frac{r_0}{r_{1}}\right\rfloor r_1 = a \% b\\\\
m_0 &= x_0 = u_{-1} = 0\\
m_1 &= u_0=x_1 - \left\lfloor\frac{r_0}{r_{1}}\right\rfloor u_{-1} = 1
\end{aligned}

is. From here, repeat the transition shown below until $ t = 0 $.

3.7. Implementation

We will implement it as described in the previous section. Note that the part that uses safe_mod in ACL is replaced by the arithmetic operator%, which is the equivalent function in Python.

def inv_gcd(a, b):
    a %= b
    if a == 0: return b, 0
    #initial state
    s, t = b, a
    m0, m1 = 0, 1
    while t:
        #Preparing for the transition
        u = s // t

        #transition
        s -= t * u
        m0 -= m1 * u

        # swap
        s, t = t, s
        m0, m1 = m1, m0

    if m0 < 0: m0 += b // s
    return s, m0


print(inv_gcd(3, 5))  # (1, 2)
print(inv_gcd(20, 15))  # (5, 1)

barrett In the problem of "A certain natural number $ m $ is given, answer the remainder divided by $ m $" to prevent overflow and to avoid slowing down the calculation speed due to a huge number even if it is a multiple precision integer. , I often take the remainder of dividing by $ m $ every time I do something. It is necessary to divide in order to find too much, but division is one of the four arithmetic operations that computers are not good at, and it takes longer than other operations. ** Barrett reduction ** is an algorithm that speeds up "a certain ** fixed natural number (constant) ** operation that takes the remainder by dividing by $ m $". For the integers $ a, b $ where $ 0 \ leq a <m, 0 \ leq b <m $ in ACL

(a \cdot b) \;\% \;m

It is used for the purpose of speeding up. In the following, we will set $ z: = a \ cdot b ; (0 \ leq z <m ^ 2) $ and consider $ z % m $.

4.1. Ideas

No matter how bad you are at division, you cannot avoid it when you ask for too much. Because not much

z\%m = z - \left\lfloor z \div m\right\rfloor m

Because it is expressed as. Also, I am not good at division by general numbers, but I am good at division by power of ** 2. For a computer running in binary, "dividing by $ 2 ^ k $" only requires "k-shifting to the right."

Therefore, in order to speed up the operation that requires too much, we will consider replacing ** the operation that we are not good at (division by general natural numbers) with the operation that we are good at (addition, subtraction, multiplication, shift operation) **. "Dividing by $ m $" is equivalent to "multiplying by $ \ frac {1} {m} $". Now, with good enough precision using natural numbers $ k, n $

\frac{1}{m} \approx \frac{n}{2^k}

Suppose that you can approximate. Then

\begin{aligned}
z\%m &= z - \left\lfloor z \cdot \frac{1}{m}\right\rfloor m\\
&= z - \left\lfloor z \cdot \frac{n}{2^k}\right\rfloor m\\
&= z - \{(z \cdot n) >> k\}m
\end{aligned}

Therefore, I was able to express too much with only the operations I am good at (subtraction, multiplication, shift operation).

4.2. How to determine k, n

So how do you determine the natural numbers $ k and n $? If $ m $ was a power of 2,

\frac{1}{m} = \frac{n}{2^k}

There are natural numbers $ k, n $ that satisfy. Therefore, from now on, we will consider the case where $ m $ is not a power of 2. First, let's look at the conditions that $ k $ must meet. Now, if you choose $ k $, where $ 2 ^ k <m $, then $ n = 1 $ is the best approximation, but in most cases this is not a good approximation. So $ k $ is

2^k > m

You need to choose to be. If $ k $ is decided, $ n $ will be

n \simeq \frac{2^k}{m}

Just choose to be. Therefore

n=\left\lceil \frac{2^k}{m}\right\rceil\; or\; \left\lfloor \frac{2^k}{m}\right\rfloor

It will be. ACL uses the ceiling function, but in general, it seems that the floor function is often selected.

Now that we know the lower limit of $ k $, how do we determine the specific value? Now, as an approximation index

e = \frac{n}{2^k} - \frac{1}{m}

Introduce. As a concrete example, let's look at the case of $ m = 1000000007 ; ; (10 ^ 9 + 7) $, which is often asked in competitive programming. At this time, the lower limit of $ k $ is

\begin{aligned}
k_{min} &= \lceil\log_2{1000000007}\rceil\\
&= 30
\end{aligned}

is. So, if you try to calculate $ e $ for $ k $ of $ k \ geq 30 $, it will be as shown in the figure below.

Since $ e $ is monotonously non-increasing with respect to $ k $, we found that the larger $ k $ is, the better. However, the larger $ k $ is, the larger the number to be handled, and the longer it takes to calculate, so it seems that it is not enough to increase it anyway. The ACL says $ k = 64 $. This also causes $ n $

n = \left\lceil \frac{2^{64}}{m}\right\rceil

It is decided.

** Supplement 1 ** $ n $ is exactly

n = 
\begin{cases}
0 & (m = 1)\\
\left\lceil \frac{2^{64}}{m}\right\rceil & (m \geq 2)
\end{cases}

It has become. The ACL requires that the inputs $ a and b $ are already too much divided by $ m $. From $ a = b = 0 $ when $ m = 1 $

\begin{aligned}
z \% m &= z - \{(z \cdot n) >> k\}m\\
&= 0 - 0 \cdot 1\\
&= 0
\end{aligned}

There is no problem because it becomes. Therefore, from now on, it will be $ m \ geq 2 $.

** Supplement 2 ** You might think it contains division, but now that $ m $ is a pre-given constant, $ n $ is also a pre-computed constant.

4.3. Why k = 64?

So why is $ k = 64 $? One reason is that the maximum value that can be handled by an unsigned long long (unsigned 8-byte integer) is $ 2 ^ {64} -1 $. And another reason is that if ** $ k = 64 $, then $ (a \ cdot b) % m for the 4-byte integer input $ a, b $ and the 4-byte integer method $ m $. There is a ** that can calculate $ correctly. To show this, we should show the following.

$ 2 \ leq m <2 ^ {31} $ is an integer $ m $ and $ 0 \ leq a, b <m $ is an integer $ a, b $, and the product $ z: = a \ cdot b $ is an integer $ q, With r $

z = qm + r\;\;\;\;\;(0 \leq r < m)

When is expressed, the following relational expression holds.

q \leq \{(z \cdot n) >> 64\} < q + 2

here,

n=\left\lceil \frac{2^{64}}{m}\right\rceil

And >> is a right shift operation.

If this is indicated

\{(z \cdot n) >> 64\}=q \;\;\; \rm{or} \;\;\; q + 1

It will be. That is, either $ z % m $ could be calculated correctly, or $ m $ was calculated less than the exact value. So if the result you get is negative, you can get the correct result by adding $ m $.

Let's prove it.

Let's start with the lower limit. Now

\frac{n}{2^{64}} \geq \frac{1}{m}

Because it is

\begin{aligned}
\left\lfloor\frac{zn}{2^{64}}\right\rfloor &\geq \left\lfloor\frac{z}{m}\right\rfloor\\
&= \left\lfloor q + \frac{r}{m}\right\rfloor\\
&= q + \left\lfloor\frac{r}{m}\right\rfloor\\
&= q
\end{aligned}

Therefore,

\{(z \cdot n) >> 64\} \geq q

It will be.

Next, let's look at the upper limit. Using the integer $ l $ where $ nm $ is $ 0 \ leq l <m $

\begin{aligned}
nm &= \left\lceil \frac{2^{64}}{m}\right\rceil m\\
&= 2^{64} + l
\end{aligned}

If you use to call

\begin{aligned}
zn &= (qm + r)n\\
&= qnm + nr\\
&= 2^{64}q + (ql + nr)
\end{aligned}

It can be obtained. Here, because $ q <m $ from $ z <m ^ 2 $

\begin{aligned}
ql+nr &< m^2 + nm\\
&< m^2 + 2^{64} + m\\
&= 2^{64} + m(m+1)\\
&< 2 \cdot 2^{64}
\end{aligned}

Because it becomes

\begin{aligned}
zn &< 2^{64}q + 2 \cdot 2^{64}\\
&= 2^{64}(q + 2)
\end{aligned}

Therefore,

\{(z \cdot n) >> 64\} < q+2

Is established.

From the above

q \leq \{(z \cdot n) >> 64\} < q + 2

Was shown.

4.4. Implementation

Let's implement it. First, create the class Barrett and implement the constructor. We also have a method to check the law $ m $.

class Barrett:
    # @param 1 <= m < 2^31
    def __init__(self, m):
        self._m = m
        self.im = -(-(1<<64) // m) if m > 1 else 0
    
    #Method returning m
    def umod(self):
        return self._m

The variable ʻim` corresponds to the previous $ n $. In ACL, it is written by taking advantage of the nature of fixed-length integers, but since Python is a multiple-precision integer, it is processed by classifying it with an if statement. By pre-calculating the part that needs to be divided by $ m $ and holding it as a constant in this way, the speed is increased.

Now implement the method "mul". This is the method that returns $ (a \ cdot b) % m $ for $ a, b $.

class Barrett:
    # __init__()
    # umod()

    def mul(self, a, b):
        assert 0 <= a < self._m
        assert 0 <= b < self._m
        z = a * b
        v = z - ((z * self.im)>>64) * self._m
        if v < 0: v += self._m
        return v
    

m = 1000000007
bt = Barrett(m)
a = 12345678
b = 87654321
print(bt.mul(a, b))  # 14799574
print((a * b) % m)  # 14799574

4.5. Practicality is ...

As some of you may have guessed, ** in Python it's faster to use the built-in arithmetic operator% obediently. ** **

It may be effective if it becomes a huge number, but it seems that it is not necessary (in Python) for the number that is handled in competitive programming.

5. Conclusion

This time we have looked at the algorithms used inside ACLs. It was difficult to follow the formula, but I deepened my understanding.

Please let us know if you have any mistakes, bugs, or advice.

[GO] [Internal_math (1)] Read with Green Coder AtCoder Library ~ Implementation in Python ~

0. Introduction

Target readers

Untargeted readers

Referenced

1.1. Modulo operation in C ++

1.2. Implementation

2.1. Naive method

2.2. Solution to Problem 1: Iterative Squares

2.3. Solution to Problem 2: The nature of modulo operations

2.4. Implementation

3.1. Euclidean algorithm

3.2. Describe the Euclidean algorithm with a mathematical formula

3.3. Extended Euclidean algorithm

3.4. Preparation for implementation of inv_gcd ①

3.5. Preparation for inv_gcd implementation ②

3.6. Preparation for implementation of inv_gcd ③

3.7. Implementation

4.1. Ideas

4.2. How to determine k, n

4.3. Why k = 64?

4.4. Implementation

4.5. Practicality is ...

5. Conclusion