[PYTHON] Is it possible to create an infinite length "non-repeating character string" with only three types of characters? [BFS] [DFS]

Overview

I thought about the question, "Can I make an infinitely long" non-repeating character string "with only three types of characters?" ("Repeat" is a unique definition of this problem, described in the next section) I haven't proved it, but I confirmed in the program that I can make a sequence without repetition up to a certain length (10000).

String with/without repetition

When the same substring (length 1 or more) exists next to each other in a string, it is assumed that the string is a repeating string . Here is an example of a string with repetitions. (The repeating part is in bold)

On the contrary, when a certain character string is not a character string with repetition, it is a character string without repetition . Here is an example of a non-repeating string:

Note that if the same substring appears but is not next to each other, it will be a non-repeating string.

When the types of characters that can be used are limited, how long can a non-repeating character string be made?

When only one type of character can be used

When there is only one type of character string that can be used (for example, "a"), how many characters can a non-repeating character string be made? When the length is 1, you can create a non-repeating character string as shown below.

When the length is 2, it is not possible to create a string without repetition. Specifically, the only character string of length 2 that can be created using one type of character is "aa", but this is a character string with repetition.

Therefore, when there is only one type of character that can be used, the maximum length that can create a non-repeating character string is 1.

When there are two types of characters that can be used

When there are two types of character strings that can be used (for example, "a" and "b"), how many character lengths can a non-repeating character string be created? When the length is 1, you can create a non-repeating character string as shown below.

When the length is 2, you can create a non-repeating character string as shown below.

When the length is 3, you can create a non-repeating character string as shown below.

When the length is 4, it is not possible to create a string without repetition. Specifically, the character strings of length 4 that can be made up of two types of characters are "aaaa", "aaab", "aaba", "aabb", "abaa", "abab", "abba", "abbb", and It's a swap of "a" and "b", but they all contain repetitions.

Therefore, when there are two types of characters that can be used, the maximum length that can create a non-repeating character string is 3.

When there are 3 types of characters that can be used

When there are three types of character strings that can be used (for example, "a", "b", "c"), how many character lengths can a non-repeating character string be made? If you move your hand steadily, you can see that you can create a non-repeating character string as shown below, at least up to a length of 10.

length Example of non-repeating strings
1 a
2 ab
3 aba
4 abac
5 abacb
6 abacba
7 abacbab
8 abacbabc
9 abacbabca
10 abacbabcab

It seems that I can make a character string that is as long as possible and does not repeat with this condition. Is it true? Below, let's think a little more about the case where three types of characters can be used.

How many non-repeating character strings can be created when the character string length is fixed

Is it possible to create an infinite number of non-repeating character strings when three types of characters can be used? It was difficult to directly prove the answer to the question. (Is it a famous problem? If anyone knows, please let me know) So, for the time being, when the character string length is L, I decided to investigate how many non-repeating character strings can be created and draw a graph.

Judgment of repeating character strings

As a preparation, create a function that determines if a string contains repetitions. In the case of python, adjacent substrings of the same length use integers $ i, j $, It can be written as text [j: j + i], text [j + i: j + 2 * i]. However, $ i $ is the length of the substring, and $ j $ is the index of the start position of the left substring when the index of the leftmost character is 0. After that, you can loop around the possible range of $ i and j $ and judge the difference of this substring. Assuming that the length of the given character string is L, the possible range of $ i, j $ is 1\leq i \leq L/2 0\leq j \leq L-2i is. If you move $ i, j $ in these ranges and compare text [j: j + i] with text [j + i: j + 2 * i], a string contains repetitions. I know if it is.

#Returns True if text contains repetitions, False if it does not
def is_repeat(text):
  for i in range(1,len(text)//2+1):
    for j in range(len(text)-2*i+1):
      if text[j:j+i]==text[j+i:j+2*i]:
        #print(j,j+i,text[j:j+i])
        return True
  return False

However, the amount of calculation of the above function is roughly $ O (L ^ 2) $, which is a little large.

When the character string A is a non-repeating character string, the repetition judgment of A'by adding one character to the end of A can be performed with a small amount of calculation. The amount of calculation is $ O (L) $ because we only need to consider the substring containing the added character.

def is_added_text_repeat(text,char):
  text = char + text
  for i in range(1,len(text)//2+1):
    if text[:i]==text[i:2*i]:
      return True
  return False

As explained in the next measure, when generating a non-repeating character string, the operation is to add one character at a time to the known non-repeating character string, so this judgment method can be used.

Judge whether it is a non-repeating character string while adding one character at a time

In order to investigate the relationship between the character string length L and the number of non-repeating character strings, we will create a non-repeating character string by brute force. This is achieved by sequentially checking to see if the end of a known non-repeating string (the beginning of the program) plus one character contains repetitions. (Once a repeat occurs, no matter how many characters are added after that, the repeat is still included, so there is no need to think about adding one character to the repeated character string.)

Since we want to preferentially generate from the one with the smallest L, we will implement it in breadth-first search. Also, the initial value is fixed to "ab", ignoring duplication due to character replacement (that is, the actual number is 6 times this).

from collections import deque

#Outputs the number of non-repeating strings up to the length of length
def without_repeat_text_bfs(length):
  unit = ["a","b","c"]
  q = deque(["ab"])
  cnt = {2:1}
  prev = "ab"
  while q:
    tmp = q.popleft()
    if len(tmp) > len(prev):
      print(len(prev),cnt.get(len(prev),0))
    prev = tmp
    if len(prev) >= length: continue
    for u in unit:
      if prev[0] == u: continue
      if is_added_text_repeat(prev,u): continue
      l = len(prev)+1
      cnt[l] = cnt.get(l,0)+1
      q.append(u+prev)
  print(length, cnt.get(length,0))

The output looks like this:

>>>without_repeat_text_bfs(55)
2 1
3 2
4 3
5 5
6 7
7 10
8 13
9 18
10 24
...

Below is a graph of the number of non-repeating character strings up to L = 55.

個数 と 文字列長.png

It looks like an exponential function, so I took the logarithm of the vertical axis. log個数 と 文字列長.png

Since it became almost a straight line, it seems that it increases exponentially. It's unlikely that this will suddenly become 0 somewhere, so no matter how large L is, it seems possible to create a non-repeating string (not proven).

How many characters can be determined to be able to create a non-repeating string?

It seems that it can be made infinitely, but I feel that it is small up to L = 55, so I will make sure that there is at least one non-repeating character string up to a slightly larger L. It takes too much calculation time to calculate the number honestly with the function in the previous section, but if you find at least one, it seems that you can calculate it a little more efficiently by depth-first search.

def without_repeat_text_dfs(length):
  unit = ["a","b","c"]
  q = deque(["ab"])
  prev = ""
  while q and len(prev) < length:
    prev = q.pop()
    for u in unit:
      if prev[0] == u: continue
      if is_added_text_repeat(prev,u): continue
      #When the text with one character added is a non-repeating character string
      #If the generated string is longer than the length specified by the argument, the function is terminated.
      if len(u+prev) >= length:
        return u+prev
      #Add a new string to the queue
      q.append(u+prev)

As a result of checking using the above function, it was confirmed that a character string that does not repeat up to L = 10000 can be found.

in conclusion

We defined a string with (no) repetitions and examined how long a string without repetitions can be made when the types of characters that can be used are limited. When there were only one and two types of characters that could be used, I could only make up to a finite length. In the case of three types, I could not prove it, but I could guess from the shape of the graph that it seems possible to make it to infinite length. Also, it was confirmed that there is a non-repeating character string at least when the length is 10,000 or less. It is a little interesting that the moment when one type and two types are finite (assuming the guess is correct) but three types suddenly diverge infinitely. There is a four-color theorem that says that if every map has four colors, adjacent areas will not be the same color, but is it related? (It's not just a one-dimensional version of the Four Color Theorem). Or is it a proven problem? If anyone can understand it, I would be grateful if you could provide information.

Recommended Posts

Is it possible to create an infinite length "non-repeating character string" with only three types of characters? [BFS] [DFS]
Is it possible to detect similar images only with ImageHash?
[Python] What is a slice? An easy-to-understand explanation of how to use it with a concrete example.
I want to extract an arbitrary URL from the character string of the html source with python
It is good to create an environment with runtime error => venv when using pyplot backends of macosx on a virtual environment created with virtualenv.