I will briefly explain how to solve the following YouTube problem that became famous. In addition, we will experiment with time and space complexity. The Floyd Cycle Detection Method used by the main character in the third solution will be explained in a little more detail. First, consider and experiment in Python, then do a similar test in C and compare.

It's more fun to read articles after watching YouTube below </ font>

Conclusion first

--Dictionaries are fast but use more memory than you think --The Floyd cycle detection method is $ O (N) $, but it is a constant multiple (compared to the other two) heavier. However, the memory usage is O (1)

Subject

--Given a list a consisting of integers $ N $ and $ N + 1 $ elements -The $ a $ element contains at least one integer from 1 to n. (That is, only one number contains two) --State duplicate (includes two) integers

Concrete example

Example: You can enter $ [3,1,3,4,2] $. In this case, answer $ 3 $. Example: You could enter something like $ [1,3,2,1] $. In this case, answer $ 1 $. Example: You could enter something like $ [1,1] $. In this case, answer $ 1 $.

Approach 1: Sorting and linear search

The first approach the hero took was

-(a) Sort in ascending order, (b) Linearly operate from the beginning to look at the elements two by two and find duplicates

The spatial complexity is $ N $, but the time complexity is $ N log N $ to process a and $ N $ to b, resulting in TLE.

from typing import List
def findDuplicate(nums: List[int]):
    nums.sort()
    for i in range(len(nums) -1):
        if nums[i] == nums[i+1]:
            return nums[i]
print(findDuplicate([1,1])) # -> 1
print(findDuplicate([3,1,3,4,2])) # -> 3

Approach 2: HashMap-Uh!

Finally, the protagonist uses HashMap-Uh! (That is, a dictionary), which requires power beyond his control. -(a) Initialize the dictionary first, (b) Look at the numbers in order, and if they already exist in the dictionary, the number of duplicates -(c) Record the value as existing if the number does not overlap As a result, MLE (memory usage error) occurs.

I think it's stupid, but it's true that dictionaries use more memory than elements. Let's take a look at the code below.

from typing import List
from sys import getsizeof
def findDuplicate(nums: List[int]):
    used = dict()
    for x in nums:
        if x in used:
            size = getsizeof(used) + sum(map(getsizeof, used.values())) + sum(map(getsizeof, used.keys()))
            print("Memory:", size)
            return x
        used[x] = True
print(findDuplicate([1,1])) # -> 1
print(findDuplicate([3,1,3,4,2])) # -> 3
print(findDuplicate(list(range(1,1000000+1)) + [1000000]))

When you do this,

Memory: 156 (Byte)
1
Memory: 184 (Byte)
3
Memory: 55100346 (Approximately 55M Byte)
1000000

Therefore, it uses a large amount of memory for the number of elements. Since the memory is $ 55M $ for the input of $ N = 10 ^ {6} $, if the memory is $ 64M $ with $ N = 2 \ cdot 10 ^ {6} $, the amount of data alone will be $ MLE $. It will end up.

[Original] Approach 2 Kai: Usage history by list from dictionary

It turns out that dictionaries use more memory space than I expected. Another approach is to create an array of $ 1-N $ in advance and compare whether it appears in $ False / True $. This approach will be used for evaluation in later experiments. The implementation is as follows.

def findDuplicate22(nums: List[int]):
    used = [False] * len(nums)
    for x in nums:
        if used[x]:
            return x
        used[x] = True

Approach 3: Floyd Cycle Detection

This is where Senpai comes in. Sempai coolly proposes Floyd's cycle-finding algorithm.

For the sake of explanation, this question sentence should be read as follows. (This will have exactly the same inputs and outputs as the original problem)

――There are n cities in this world, each with a teleporter. -An array $ a $ consisting of $ n + 1 $ integers is given. This array contains at least one integer from 1 to n. (That is, only one number contains two) --Now you are in city 0. --When you are in the city of i ($ 0 \ leq i \ leq n $) Next teleport to the city of $ a_ {i} $ --If you visit the same city twice, stop teleporting in that city. Where is this city?

As an example, let's say you are given $ n = 4, a = {3,1,3,4,2} $. At this time, it moves as "City 0-> City 3-> City 4-> City 2-> City 3" and stops here (because it stopped in City 3 twice). It can be expressed as follows in the figure. Notice the figure below. It is divided into a loop part of $ 3-> 4-> 2-> .. $ and a "tail" of $ 0 $ until entering this loop. Now, let's call the node that connects the tail to the loop (3 in the figure) the "entrance of the loop". Given this teleport problem, the entrance to the loop is where it enters from the tail and returns from the end of the loop (that is, it's a node that you visit twice), so it's exactly the node you're looking for.

Now we need to find the entrance to the loop coolly. Consider the Floyd cycle detection method proposed by Sempai. I will leave the explanation to other articles, but by using Floyd cycle detection method, it is possible to know the length of the $ loop \ mu $ and the node \ lambda $ at the entrance of the $ loop, and this spatial complexity is amazing. Since it is $ O (1) $ and the time complexity is $ O (\ lambda + \ mu) $, the worst is about $ O (N) $.

For the Floyd Cycle Detection method, Visualize Floyd Cycle Detection Method with R is easy to understand.

The code that implements this is below.

from typing import List
def findDuplicate(nums: List[int]):
    hare = tortoise = 0
    #Turtle speed 1,Rabbit runs at speed 2
    while True: #Loop until it hits
        tortoise = nums[tortoise]
        hare = nums[hare]
        hare = nums[hare]
        if tortoise == hare:
            break
    m = tortoise
    #Return the rabbit to its initial position
    hare = 0
    while tortoise != hare: #Loop until it hits
        tortoise = nums[tortoise]
        hare = nums[hare] #Rabbit also speed 1
    l = hare
    u = 0
    #Move only the rabbit from the overlapping position in the loop
    while True: #Loop until it hits
        u += 1
        hare = nums[hare]
        hare = nums[hare]
        if tortoise == hare:
            break
    # m:Number of floors before a collision
    # l:、u:Loop length
    print("debug: m=",m, "l=",l, "u=",u)
    return l

print(findDuplicate([1,3,2,4,1,5]))
print(findDuplicate([3,3,4,2,5,1]))
print(findDuplicate([2,2,3,4,5,1]))
print(findDuplicate([4,3,4,2,1]))
#Appropriate"30"Create a randomly ordered list with duplicates
import random
l = list(range(1,100)) + [30]
random.shuffle(l)
print(findDuplicate(l))

Execution result:

debug: m= 4 l= 1 u= 3
1
debug: m= 1 l= 3 u= 5
3
debug: m= 1 l= 2 u= 5
2
debug: m= 2 l= 4 u= 2
4
debug: m= 40 l= 30 u= 17
30

Looking back and experimenting

Now, let's look back on the time complexity and the spatial complexity.

algorithm	Time complexity	Spatial complexity
1:sort&Linear search	O(NlogN)	O(N)
2:dictionary	O(N)	O(N) 但し、dictionaryの実装依存により大きく変化
2 breaks:list	O(N)	O(N)
3:Floyd cycle detection method	O(N)A lot of constant times	O(1)

In the code given in the appendix, $ N = 10 ^ {7} $ and duplicates ran each algorithm on randomly selected data to measure time and memory.

This was experimented as follows. -(Use gen.py) Add $ 1-10 ^ 7 $ and only one random number in it, shuffle and write to file -Each algorithm reads the file and performs processing (these are called processing 1,2,2 modification, 3) -Perform processing that only reads (this is called processing 0) -Subtract the execution time and memory of process 0 from the execution time and memory of each algorithm. In this way, the time and memory required for processing are calculated.

The following is the memory usage / execution time of processes 1, 2 and 3 minus the memory usage of process 0.

Number of times	1 execution time	2 execution time	2 break execution time	3 execution time	1 memory	2 memory	2 modified memory	3 memory
1	8.7	2.3	1.2	3.1	43MB	491MB	78MB	0MB
2	8.8	1.5	0.8	3.6	43MB	245MB	78MB	0MB
3	8.5	1.3	0.3	4.8	41MB	245MB	78MB	0MB
4	8.5	2.1	1.1	1.5	39MB	491MB	78MB	0MB
5	9.9	1.9	1.6	8.7	39MB	245MB	78MB	0MB
6	9.3	2.4	1.4	3.8	39MB	491MB	78MB	0MB
7	7.9	0.1	0.0	2.5	39MB	30MB	78MB	0MB
8	8.7	2.3	0.7	4.7	39MB	491MB	78MB	0MB
9	8.9	1.2	0.6	3.8	48MB	245MB	78MB	0MB
10	8.5	2.1	0.8	6.6	38MB	491MB	78MB	0MB

Here are some things to understand.

--Process 1 (sort / linear search) takes a long time as a whole. It is estimated that sorting takes time (probably 5 or 6 seconds). Memory usage is about the original data (should have generated an array of the same length for sorting) --Process 2 (dictionary) is fast, but uses a lot of memory for the dictionary --Process 2 break (list) is even faster, and the memory is twice as stable as process 1. --Process 3 is an intermediate speed, but this process uses almost no memory

In addition, interesting things have been observed in Process 2.

--There is only a pattern of memory usage of 30MB, 245MB or 491MB This seems to be due to the implementation of the dictionary. .. Since the dictionary does not know how much memory should be reserved when it is created or an element is added, reserve a small amount of memory and increase the reserved amount as the number of management elements increases. The above is not a sufficient experiment, but perhaps when the amount of management exceeds a certain level, it is going to double or double the memory.

Comparison by C language

I implemented and tested a similar algorithm in C (not C ++). --Process 1 sorts by qsort of stdlib --Process 2 is not executed. This is because there is no hashmap equivalent in MUJI C. --Process 2 breaks manage usage with a list of ints --Processing 2 As a modified bit, use is managed by a long long bit instead of char. --Process 3 will continue to be performed.

The code is in Reference 4.

There is no process 2 in the above table, but process 2 bits are included. Please be careful when comparing. </ font>

Number of times	1 execution time	2 break execution time	2 break bit execution time	3 execution time	1 memory	2 modified memory	2 modified bit memory	3 memory
1	2.5	0.1	0.0	0.8	39MB	39MB	1MB	0MB
2	2.3	0.3	0.0	0.3	39MB	39MB	1MB	0MB
3	2.1	0.0	0.0	0.3	39MB	39MB	1MB	0MB
4	2.4	0.0	0.0	1.2	39MB	39MB	1MB	0MB
5	2.3	0.2	0.0	1.3	39MB	39MB	1MB	0MB
6	2.3	0.2	0.1	1.0	39MB	39MB	1MB	0MB
7	2.2	0.2	0.1	0.3	39MB	39MB	1MB	0MB
8	2.7	0.2	0.1	0.3	39MB	39MB	1MB	0MB
9	2.3	0.1	0.0	1.2	39MB	39MB	1MB	0MB
10	2.0	0.0	0.0	1.2	39MB	39MB	1MB	0MB

--Process 1 is the slowest and memory usage is stable --Since Process 2 Kai could be specified as int, unlike Python, it uses the same memory usage as Process 1. The speed is also extremely fast compared to Python. (In Python, what was 4,5 times faster than process 1 is now 10-20 times faster) ――Processing 2 Kai bit manages one history with int (32bit) instead of only one with long (64bit), so the memory usage is reduced to 1/32. Also, the processing speed is faster. Probably, all the information is cached, so it is presumed that the memory access is faster. --- Process 3 still does not use any memory. It seems that the processing speed is about twice as fast as Python compared to processing 1.

Comparison of C and Python

Unit of time: sec

language	1 execution time	2 execution time	2 break execution time	2 break bit execution time	3 execution time	1 memory	2 memory	2 modified memory	2 modified bit memory	3 memory
Python	8.8	1.7	0.8	-	4.2	40MB	347MB	78MB	-	0MB
C	2.3	-	0.1	0.0	0.8	39MB	-	39MB	1MB	0MB

Fast and memory-saving solution: Use XOR

Let x be the value obtained by taking the XOR from 1 to N. For x, xoring all the elements of the array a gives the solution a this time. Now, let's say N = 4, and $ a = [3,1,3,4,2] $ is given. If $ x = 1 \ oplus 2 \ oplus 3 \ oplus 4 , $ a = x \oplus 3 \oplus 1 \oplus 3 \oplus 4 \oplus 2 = 1 \oplus 2 \oplus 3 \oplus 4 \oplus 3 \oplus 1 \oplus 3 \oplus 4 \oplus 2 = 3 $$ Like, the number that exists in a only once disappears, and only 3 that appears twice remains.

$ X \ oplus x = 0 $ for any integer x

Also, XORfrom1 to N can be calculated by $ O (N) $ as follows. (You can judge 1 or 0 for each bit in binary)

//Code to find XOR from 1 to N
if( ((N+1) / 2) % 2 == 1 ) { a = 1; } else { a = 0;}
for(int i = 0 ; i < 31 ; i++)
    if( ((N / (1<<i) ) % 2 == 1) && ((N % (1<<i) ) % 2 == 0)) a += (1 << i);

Floyd Cycle Detection requires the array $ a $ to be in memory for teleportation, but this solution does not require it. Therefore, it is faster than process 2 modified bit and consumes less memory than process 3. The execution result is shown below.

Due to the environment at the time of writing this, it will be executed on another machine instead of the above machine. It is just a comparison of each algorithm </ font>

■ Processing 3:Floyd cycle detection method
$ ~/time-1.9/time -f "%M KB,%E sec" ./a.out 3 < in7
39704 KB,0:02.07 sec
■ Processing 2 break bit:Process input as is without putting it in an array
$ ~/time-1.9/time -f "%M KB,%E sec" ./a.out< in7
1844 KB,0:01.06 sec
■xor
$ ~/time-1.9/time -f "%M KB,%E sec" ./dup4< in7
624 KB,0:00.91 sec

The full text of this code looks like this:

#include <stdio.h>
int main(int argc, char **argv){
    unsigned int N, a, x, i;
    scanf("%d",&N);
    N -= 1;
    if( ((N+1) / 2) % 2 == 1 ) { a = 1; } else { a = 0;}
    for(i = 0 ; i < 31 ; i++)
        if( ((N / (1u << i) ) % 2 == 1) &&
            ((N % ((1u << i) ) % 2 == 0)) ) a += (1 << i);
    for(i = 0 ; i < N+1 ; i++){
        scanf("%d",&x);
        a ^= x;
    }
    printf("%d\n", a);
    return 0;
}

Reference 1: Random input data creation

import random
import sys
n=int(sys.argv[1])
x = random.randrange(n)+1
l = list(range(1,n+1)) + [x]
random.shuffle(l)
print(n+1)
for i in range(n+1):
    print(l[i])

Reference 2: Input data loading part in each algorithm

import sys
fd = open(sys.argv[1], "r")
q = int(fd.readline())
dat = [0] * q
for i in range(q):
    dat[i] = int(fd.readline())
findDuplicate(dat)

Reference 3: Contents executed in the verification

rm out*
for i in {0..9}; do
python3 gen.py 1000000 > in6
echo $i
~/time-1.9/time -f "%M,%E" python3 1.py in6 2>>out6.1.csv
~/time-1.9/time -f "%M,%E" python3 2.py in6 2>>out6.2.csv
~/time-1.9/time -f "%M,%E" python3 3.py in6 2>>out6.3.csv
~/time-1.9/time -f "%M,%E" python3 0.py in6 2>>out6.0.csv
done

for i in {0..9}; do
python3 gen.py 10000000 > in7
echo $i
~/time-1.9/time -f "%M,%E" python3 1.py in7 2>>out7.1.csv
~/time-1.9/time -f "%M,%E" python3 2.py in7 2>>out7.2.csv
~/time-1.9/time -f "%M,%E" python3 3.py in7 2>>out7.3.csv
~/time-1.9/time -f "%M,%E" python3 0.py in7 2>>out7.0.csv
done


for fn in `ls out*csv*`; do
 sed -i -e 's/,.:\([0-9][0-9]\)\.\([0-9][0-9]\)/,\1.\2/' $fn
done

Reference 4: Implementation of C

#include <stdio.h>
#include <stdlib.h>

int fcmp(const void * n1, const void * n2)
{
    if (*(int *)n1 > *(int *)n2)
    {
        return 1;
    }
    else if (*(int *)n1 < *(int *)n2)
    {
        return -1;
    }
    else
    {
        return 0;
    }
}

int solve1(int N, int* a){
    qsort(a, N + 1, sizeof(int), fcmp);
    for(int i = 0; i < N; i++){
        if(a[i] == a[i+1]) return a[i];
    }
    return -1;
}

int solve2(int N, int* a){
    int *dict = (int *)calloc(sizeof(int), N + 1);
    for(int i = 0; i < N+1; i++){
        if (dict[a[i]] == 1){
            free(dict);
            return a[i];
        }
        dict[a[i]] = 1;
    }
    free(dict);
    return -1;
}

int solve22(int N, int* a){
    int cnt = ((N+1) / 64) + 2;
    unsigned long* dict = (unsigned long*)calloc(sizeof(unsigned long ), cnt);
    for(int i = 0; i < N+1; i++){
        if ( ( (dict[a[i]/64] >> (int)(a[i] % 64) ) & (unsigned long)1 )  == 1){
            free(dict);
            return a[i];
        }
        dict[a[i]/64] |= ((unsigned long)1 << (int)(a[i] % 64));
    }
    free(dict);
    return -1;
}


int solve3(int N, int* a){
    int tortoise;
    int hare;
    tortoise = a[0];
    hare = a[a[0]];
    while (tortoise != hare)
    {
        tortoise = a[tortoise];
        hare = a[a[hare]];
    }
    hare = 0;
    while (tortoise != hare)
    {
        tortoise = a[tortoise];
        hare = a[hare];
    }
    return hare;
}


int main(int argc, char **argv){
    int type=1;
    if (argc > 1){
        type = atoi(argv[1]);
    }

    int N;
    scanf("%d",&N);

    int *a;
    a = malloc(sizeof(int) * N +1);

    for(int i = 0 ; i < N+1 ; i++){
        scanf("%d",&a[i]);
    }
    switch (type) {
        case 1:
            printf("algo1\n");
            printf("%d\n", solve1(N, a));
            break;
        case 2:
            printf("algo2\n");
            printf("%d\n", solve2(N, a));
            break;
        case 22:
            printf("algo22\n");
            printf("%d\n", solve22(N, a));
            break;
        case 3:
            printf("algo3\n");
            printf("%d\n", solve3(N, a));
            break;
        case 0:
            break;

        default:
            break;
    }
    return 0;
}

Consider If Programming Was An Anime in Python and C