[GO] ARC104 B --Solve DNA Sequence with Swift!

Introduction

Hello! This is TARDIGRADE from the Yashiro High School PC Club Formation Association. In this article, I would like to solve ARC104 B --DNA Sequence with Swift! This is a commentary for those who participated in Swift Zoomin'# 4 held on October 24, 2020. I hope that the process of thinking from the time you read the problem to the time you code it will be conveyed so that even those who have never experienced competitive programming can easily understand it. Please note that the explanation may be complicated because I wrote it in a hurry at midnight. If you have any questions, please leave a comment or send them to Twitter.

Step1: Understand the problem

Of course, in order to solve a problem, you must have a solid understanding of the content of the problem. Competitive programming problems are often written in the form of math tests, and the first time you see them may be "Uh ...", but let's do our best to read them.

Tips for understanding the problem

** ・ Try to visualize what is being asked by drawing a figure or a formula by yourself ** By visualizing, we often come up with an efficient solution.

** ・ Look closely at test cases ** The test case may describe the flow of the program. Read it carefully as it will be very helpful in understanding the problem. Also, by looking at multiple test cases, it may be possible to proceed with consideration such as "It looks complicated, but in reality there are only two choices, A or B!".

B-What is being asked in the DNA Sequence

The word "complementary" is quite a power word and is prone to rejection, but if you read it carefully, the meaning of "complementary" is written in the question sentence.

|T1|=lWhen, which1≤i≤lAlso aboutT1,T2ofi文字目of組み合わせが (AWhenT),Or(CWhenG) of組み合わせofいずれかであるこWhenを指します。

|T1|=For any 1 ≤ i ≤ l, where lThe part of is like a fixed phrase, so you can ignore it. If you don't define it properly, you will be complained, so the problem statement is written very finely and carefully, but there are many parts that say "I'm not saying a big deal". This area is very similar to mathematics.

By the way, It means that the combination of the i-th letter of T1 and T2 is either (A and T) or (C and G) </ code>. It won't be difficult. As you read through the problem statement, you will find that when you rearrange a substring $ T $, you must ** determine if there is a string complementary to ** $ T $ **. .. Complementary strings mean that ** originally $ "A" $ is replaced by $ "T" $, and $ "T" $ is replaced by $ "A" $ * *about it. The same is true for ** $ (C $ and $ G) $.

If you go through the discussion so far, you will notice that there is a method of judgment. In fact, noticing it is the key to this problem, so if you haven't come to the point, think about it for a while.

………That's right! When the substring $ T $ is rearranged, the $ "A" $ and $ "T" $ contained in ** $ T $ must be present in order to have a string complementary to $ T $. Must be equal in number and $ "C" $ and $ "G" $ must be in equal numbers **.

Now the problem is ** "$ S $ is a contiguous non-empty substring $ T $, and the number of " A " and " T " contained in $ T $ is equal. , And find the number of things with the same number of $ "C" $ and $ "G" $ "**" can be replaced with the problem.

In this way, it is also important for competition professionals to "replace problem sentences with easy-to-understand forms."

Step2: Think of a straightforward solution

Now that the problem has been summarized briefly, let's consider a solution. At this time, the constraint of the value given by the standard input is very helpful. In this case, $ N <= 5000 $, so the $ O (N ^ 2) $ solution will be in time (for those who do not understand the concept of computational complexity order, [this site](https: // bmf-tech) .com / posts / O (order) notation and how to calculate the computational complexity of the algorithm #: ~: text = Computational complexity (order) is an index expressed as a percentage.)).

Now that we know the upper limit of the amount of calculation, let's think about the solution in earnest. Step 1 shows that if you know the number of $ "A", "T", "C", "G" $ in $ T $, you can determine if $ T $ has a complementary string. So, for the time being, I will write it honestly.

let input = readLine()!.split(separator: " ")
let n = Int(input[0])!
let s = Array(input[1]).map{String($0)}
var answer = 0
for i in 0..<n-1{
    for j in i+1..<n{
        var ac = 0
        var tc = 0
        var cc = 0
        var gc = 0
        for k in i...j{
            if s[k] == "A"{ac += 1}
            if s[k] == "T"{tc += 1}
            if s[k] == "C"{cc += 1}
            if s[k] == "G"{gc += 1} 
        }
        if ac == tc && cc == gc{answer += 1}
    }
}
print(answer)

If you actually put in a test case, you can see that you are looking for the correct answer. However, the amount of calculation of this program is $ O (N ^ 3) $, which will cause you to get stuck in the execution time limit (When you actually submit it, it looks like this / contests / arc104 / submissions / 17643029))). In this way, if you estimate the upper limit of the amount of calculation first, you can prevent you from taking a penalty by submitting a code that becomes TLE.

Step3: Speed up the program

From here, let's think about how to speed up. In the first place, it costs $ O (N ^ 2) $ to specify a substring, so ** "Determine whether a complementary string exists with $ O (1) $" You have to think of ** or ** "how to solve without specifying individual substrings" **. ~~ This problem seems to work well if you think about the former. If you manage the amount, you will gradually understand "which point should be focused on to speed up", so don't worry if you don't know now. ~~

Click here for the former solution. Click to expand.
Now, I have to make a judgment with $ O (1) $, but what should I do? Now, let's think a little more about "making a decision with $ O (1) $". The judgment here requires the number of $ "A", "T", "C", "G" $. Specifically, it is the number of each character contained in ** $ S [i] $ to $ S [j] $ **. Right now I'm counting from $ i $ to $ j $ in a loop, but can I get this with $ O (1) $ ...? If possible, the judgment itself can be done with $ O (1) $, and the total amount of calculation will be improved to $ O (N ^ 2) $.

To make it easier to skip the idea, let's transform the formula to find the number of each character contained in $ S [i] $ to $ S [j] $.

(What you want to find) = (the number of each character contained in $ S [i] $ to $ S [j] $)

= $ (Number of each character contained in S [i] $ $) $ + $ (Number of each character contained in S [i + 1] $ $) $ + ...... + $ (Number of each character contained in S [j-1] $ $) $ + $ (Number of each character contained in S [j] $ $) $

= ** $ (number of each character contained in S [1] $ to $ S [j] $ $) - (included in S [1] $ to $ S [i-1] $ Number of each character in $) $ **

With this transformation, the formula has become much simpler. And take a closer look at the last formula. This can be calculated by $ O (1) $ by using ** cumulative sum **. Regarding cumulative sum, there is an article that is much easier to understand than I explain, so I will introduce that.

Be able to write cumulative sums without thinking!

With this, you can make a ** judgment with $ O (1) $ and the whole process with $ O (N ^ 2) $! ** ** When implemented, it will be as follows. (I'm sorry that the explanation of this area is rough, but it is the most important part, so let's refer to the linked article etc. so that you can understand what you are doing)

let input = readLine()!.split(separator: " ")
let n = Int(input[0])!
let s = input[1]
var aa:[Int] = [0]
var tt:[Int] = [0]
var cc:[Int] = [0]
var gg:[Int] = [0]
var a = 0
var t = 0
var c = 0
var g = 0
//Preprocessing to use cumulative sum
for i in s{
    if i == "A"{a += 1}
    if i == "T"{t += 1}
    if i == "C"{c += 1}
    if i == "G"{g += 1}
    aa.append(a)
    tt.append(t)
    cc.append(c)
    gg.append(g)
}
//From here on, it's the same as the straightforward method I created earlier
var ans = 0
for i in 0...n-1{
    for j in i+1...n{
        if aa[j]-aa[i] == tt[j]-tt[i] && cc[j]-cc[i] == gg[j]-gg[i]{
            ans += 1
        }
    }
}
print(ans)
** Added on October 26, 2020 After rereading the problem, I found that I could do either. Rather, the latter is easier in terms of concept and implementation amount. ** **

Specifically, I would like to explain how to solve the latter case. First, let's see how the program written in Step 2 works. Then you will notice that the calculation is wasteful. Specifically, it is the ↓ part.

for j in i+1..<n{
    //Abbreviation
    for k in i...j{
        //Abbreviation
    }

If you follow the flow of the program moving carefully one by one,

Find the value of $ i ... j $ in a loop Find the value of $ → i ... j + 1 $ in a loop Find the value of $ → i ... j + 2 $ in a loop Repeat below $ → $

You can see that it works like this. However, if you think about it, the value of $ i ... j + 1 $ can be obtained by simply adding the value of $ j + 1 $ to the value of $ i ... j $. It is a wasteful calculation to recalculate the value of $ i ... j $. Similarly, when finding the value of $ i ... j + 2 $, simply add the value of $ j + 2 $ to the value of $ i ... j + 1 $.

So what I mean is that ** a value once calculated can be used to find the next value **. If this is improved, the amount of calculation can be reduced to $ O (N ^ 2) $ without using the cumulative sum. When implemented, it will be as follows.

let input = readLine()!.split(separator: " ")
let n = Int(input[0])!
let s = Array(input[1]).map{String($0)}
var answer = 0
for i in 0..<n-1{
    var ac = 0
    var tc = 0
    var cc = 0
    var gc = 0
    for j in i..<n{
        if s[j] == "A"{ac += 1}
        if s[j] == "T"{tc += 1}
        if s[j] == "C"{cc += 1}
        if s[j] == "G"{gc += 1} 
        if ac == tc && cc == gc{answer += 1}
    }
}
print(answer)

If you try to extract the part that has been used for unnecessary calculations in this way, it will be a story that "That's right", but it is hard to notice unless you are accustomed to it. At the beginning, it is recommended to solve it with the feeling of "thinking about a straightforward solution" → "looking at the flow of the program and looking for unnecessary calculations".

Finally

This is the end of the explanation. The explanation may have been difficult to understand in many places (I'm not strong enough, I'm sorry), but I hope it conveys the general atmosphere when solving the problem. As I wrote at the beginning, if you have any questions, please feel free to comment or Twitter.

Recommended Posts

ARC104 B --Solve DNA Sequence with Swift!
Getting Started with Swift
Starting with Swift Swift UI