[PYTHON] Get the longest character length contained in an interval in a segment tree

Since it was written that the segment tree can do various things depending on what you put on it, I implemented what I came up with. It is written in various places that you have to devise something to put on, but it was a good example so I posted it.

What can be achieved

--Initialization: $ O (N) $ --Get maximum length of characters with interval: $ O (logN) $ --One character replacement: $ O (logN) $

Assumed problem

I want to answer the following problems at high speed.

--A string of n characters s is given. s is expressed as $ s_ {1}, s_ {2} \ dots s_ {n} $ from the first character. --Answer the following 5 types of q queries.

  1. Rewrite set (i, x) $ s_ {i} $ to the letter x.
  2. Answer the number of ʻacontained in the interval ofgetCount (s, t)` $ s_ {s}, s_ {s + 1}, \ dots, s_ {t} $.
  3. Answer the maximum value of the number of consecutive a contained in the interval$ s_ {s}, s_ {s + 1}, \ dots, s_ {t} $getLongestLen (s, t) `.
  4. Answer the getLeftLen (s, t) $ s_ {s}, s_ {s + 1}, \ dots, s_ {t} $ interval the number of consecutive a from the left end
  5. Answer the getRightLen (s, t) $ s_ {s}, s_ {s + 1}, \ dots, s_ {t} $ interval the number of consecutive a from the right end

--The constraint is assumed to be about $ 1 \ leq n \ leq 100,000 $, $ 1 \ leq q \ leq 100,000 $ (★ I haven't checked if it passes below) --In this article, ʻais targeted for each count etc. for the sake of simplicity of explanation (actually, it seems to be a problem to operate the above for the character`).

policy

Implement a segment tree that carries the following information on each node. (char, count, leftLen, rightLen, longestLen, noneCount)

An example is shown above. char: Characters contained (not present in the figure) count: Number of a (5) leftLen: Number of consecutive a from the left end of the string (1) rightLen: Number of consecutive a from the right end of the string (2) longestLen: Maximum length of a in the interval (may be leftLen or rightLen) (3) noneCount: Sum of noneCount of L and R (not present in the figure) Will be. The need for noneCount will be explained later.

Value setting 1. About set (i, x)

Because it is a value of at most one character --When x = a ("a ", 1, 1, 1, 1, 0) --x = otherwise (<char>, 0, 0, 0, 0, 0) And the right end to be filled when making a seg tree is (None, 0, 0, 0, 0, 1).

Parent node update basics

First, the basic operation will be described. Hereinafter, L and R are the left child nodes of the node, and R is the right child node. There are some important exceptions to (*), which will be described later.

count: Sum of L and R counts leftLen: Take over the leftCount of the L node () rightLen: Take over the rightCount of the R node () longestLen: Maximum value (own leftLen, own rightLen, L rightLen + r leftLen, L longest, R Longest) noneCount: Sum of noneCount of L and R

Here, the calculation of longestLen is the maximum of some values. First, it is self-evident that the longest Lens for L and R are candidates. For L rightLen + r leftLen, in the parent node, the right end of L and the left end of R are concatenated, so these are continuous character strings and are the longest candidates.

After this, the following two patterns will be described for the continuous length processing of leftLen and rightLen, which stated that there is a special case *.

Basic pattern: Counting strings from the edge and processing the right edge of a string that is not a power of 2

The count of strings from one end is special, if all the nodes under it are consecutive a, that end has a continuous string combined with the other end. Two examples are shown, and Example 2 corresponds to this. --Example 1: "aabc" + "aaxz" = "aabcaaxz": The continuous character string from the right end is the same as L 2, and the left end is also 0. --Example 2: "aaaa" + "aaxz" = "aaaaaaxz": The left end is also 0, but since the right end is all a, it is concatenated with the left end of R and L string length + left end of R It becomes. This is the basis of string counting from the edge.

Next, due to the characteristics of the segment tree, the problems that clearly occur during initialization are shown in the figure.

In the segment tree, the length of the target list is basically aligned to the power of 2 and stored. If you store the 7 characters "axbaaaa" as above, the right edge will be empty. This time, it is assumed that None is stored. At this time, when node 6 receives a value from nodes 7 and 8, 8 has no value, so the continuous character string on the right side is treated as 0. Therefore, noneCount is used.

--When node 6 receives a value from node 8, this interval length 1 --noneCount value 1 = 0 is included, so set rightLen as 0 + 2 (rightLen of L). --Consider updating node 2. Similarly, R contains length 2 --noneCount1 = 1, so set leftLen as 0 + 2 (rightLen of L).

The implementation of this is as follows. For the subsequent "considered cases", noneCount (padding) processing is added to both the right and left ends.

    def funcSegmentValue(self, lNode, rNode, parentNodeId):
        lchar, lcnt, lconsLeft, lconsRight, lconsLen, lNoneNum= lNode
        rchar, rcnt, rconsLeft, rconsRight, rconsLen, rNoneNum = rNode

        # l,Total amount of characters in r
        ncnt = lcnt + rcnt

        #The right length after concatenation basically matches the left
        nconsLeft = lconsLeft
        nconsRight = rconsRight

        #Even if the number of consecutive character strings at the left end of the right node is insufficient when adding up the right nodes
        #If you meet the padding, add it to the number of consecutive strings at the right end of the left node
        #print("magic = {0}".format(2 ** (self.depthTreeList - ((parentNodeId + 1).bit_length()))))

        if lconsLeft == (2 ** (self.depthTreeList - ((parentNodeId + 1).bit_length())) - lNoneNum):
            nconsLeft += rconsLeft
        if rconsRight == (2 ** (self.depthTreeList - ((parentNodeId + 1).bit_length())) - rNoneNum):
            nconsRight += lconsRight


        nconsLen = max(nconsLeft, nconsRight, lconsLen, rconsLen, lconsRight + rconsLeft)

        nNoneNum = lNoneNum + rNoneNum
        res = [nchar, ncnt, nconsLeft, nconsRight, nconsLen, nNoneNum]
        return res

Cases to consider: Left-hand processing when querying

Now consider the query $ [1,4) $ in the four-letter "aaab".

Since the parent node of "aa" is in charge of the interval [0,1], it queries the child node of the interval 0,1 (in this case, the leaf node). At this time, since the node of the leaf of interval 0 is not included in the range, it is answered that the range (1 in this case) is the response load (= None).

    def querySub(self, a, b, nodeId, l, r):
        """
        [a,b)To node node for the parent query of the interval[l, r)Delegate the search for
For intervals, the subscript of data is 0,1,2,3,When it is 4,
        [0,3)Then 0,1,Returns the result of 2
Condition to be None: r <= a or b <= l
        """
        if (r <= a or b <= l):
            cannotAnswer = (r - l)
            return [None, 0, 0, 0, 0, cannotAnswer]

        if a <= l and r <= b:
            res =  self.dat[nodeId]
            return res

        resLeft = self.querySub(a, b, 2 * nodeId + 1, l, (l + r) // 2)
        resRight = self.querySub(a, b, 2 * nodeId + 2, (l + r) // 2, r)
        res = self.funcSegmentValue(resLeft, resRight, nodeId)
        return res

Whole code

#Of the target character
#Just a number,Number of consecutive to the right,Number of consecutive left,Consecutive number, (Assumed from the end)Number of consecutive None
#Count
# cons: consecutive
#data is
# char=Data included, cnt, consLeft, consRight, consLen,Number of noneNum
class segmentTreeCharCount():
    dat = []
    lenTreeList = -1
    targetChar = None
    depthTreeList = 0
    lenPaddingEntry = 0
    unitDefault = [None, 0, 0, 0, 0, 1]

    def __init__(self):
        pass

    def load(self, l, tc):
        self.targetChar = tc
        # len(l)Get the square of 2 greater than the number
        self.lenTreeList = 2 ** (len(l) - 1).bit_length()  #2 for len 5^3 = 8
        self.depthTreeList = (len(l) - 1).bit_length() #Number of steps of wood(0 origin)
        #lenPaddingEntry is[1,2,3,4,5]If you give[1,2,3,4,5,None,None,None]Returned 3 because it was treated as
        self.lenPaddingEntry = 2 ** (len(l) - 1).bit_length() - len(l) #How many entries were completed

        self.dat = [self.unitDefault] * (self.lenTreeList * 2)
        #Load value
        for i in range(len(l)):
            if l[i] == self.targetChar:
                self.dat[self.lenTreeList - 1 + i] = [l[i], 1, 1, 1, 1, 0]
            else:
                self.dat[self.lenTreeList - 1 + i] = [l[i], 0, 0, 0, 0, 0]
        self.build()

    def funcSegmentValueById(self, nodeId):
        l = self.dat[nodeId * 2 + 1]
        r = self.dat[nodeId * 2 + 2]
        return self.funcSegmentValue(l, r, nodeId)

    #Do the calculation to write.
    #In this case, the boundary is used in this calculation.,r position a,By inserting b, padding at the right end and the left end is performed.
    def funcSegmentValue(self, lNode, rNode, parentNodeId):
        #print("funcSegmentValue parentNode={0}".format(parentNodeId))
        #print("L:")
        lchar, lcnt, lconsLeft, lconsRight, lconsLen, lNoneNum= lNode
        #print(lNode)
        #print("R:")
        rchar, rcnt, rconsLeft, rconsRight, rconsLen, rNoneNum = rNode
        #print(rNode)

        #Renamed here for convenience(It doesn't mean much)
        if lchar is None or rchar is None:
            nchar = None
        elif rchar is not None:
            nchar = rchar
        elif lchar is not None:
            nchar = lchar

        # l,Total amount of characters in r
        ncnt = lcnt + rcnt

        #The right length after concatenation basically matches the left
        nconsLeft = lconsLeft
        nconsRight = rconsRight

        """
        #print("searchdepth = {0}".format(self.depthTreeList - ((parentNodeId + 1).bit_length() - 1)))
        if lcnt == 2 ** (self.depthTreeList - ((parentNodeId + 1).bit_length())):
            #print("child!L!")
            nconsLeft += rconsLeft
        if rcnt == 2 ** (self.depthTreeList - ((parentNodeId + 1).bit_length())):
            #print("child!R!")
            nconsRight += lconsRight
        """

        #In the case of root, even if the number of consecutive character strings at the left end of the right node is insufficient when adding up the right nodes
        #If you meet the padding, add it to the number of consecutive strings at the right end of the left node
        #print("magic = {0}".format(2 ** (self.depthTreeList - ((parentNodeId + 1).bit_length()))))

        if lconsLeft == (2 ** (self.depthTreeList - ((parentNodeId + 1).bit_length())) - lNoneNum):
            #print(" parentnodeid {2} l root special cur={0} add={1}".format(nconsRight, rconsLeft, parentNodeId))
            nconsLeft += rconsLeft
        if rconsRight == (2 ** (self.depthTreeList - ((parentNodeId + 1).bit_length())) - rNoneNum):
            #print(" parentnodeid {2} r root special cur={0} add={1}".format(nconsRight, rconsLeft, parentNodeId))
            #print(" nconsRight{0} += lconsLeft{1}".format(nconsRight, lconsLeft))
            nconsRight += lconsRight

        #print("update n={0}, max({1},{2},{3},{4},{5}".format(parentNodeId, nconsLeft, nconsRight, lconsLen, rconsLen, lconsRight + rconsLeft))

        nconsLen = max(nconsLeft, nconsRight, lconsLen, rconsLen, lconsRight + rconsLeft)

        nNoneNum = lNoneNum + rNoneNum
        res = [nchar, ncnt, nconsLeft, nconsRight, nconsLen, nNoneNum]
        #print("Return{0}".format(res))
        return res

    # char=Data included, cnt, consLeft, consRight, consLen
    def build(self):
        for nodeId in range(self.lenTreeList - 2, -1, -1):
            #Important: This code will regenerate the list, so replace it with an assignment!
            self.dat[nodeId] = self.funcSegmentValueById(nodeId)

    def setValue(self, i, a):
        """
        set a to list[i]
        """
        #print("setValue: {0}, {1}".format(i, a))
        nodeId = (self.lenTreeList - 1) + i
        #print(" first nodeId: {0}".format(nodeId))
        """
        self.dat[nodeId] = a
        if a == self.targetChar:
            self.dat[self.lenTreeList - 1 + i] = [a, 1, 1, 1, 1, 0]
        else:
            self.dat[self.lenTreeList - 1 + i] = [a, 0, 0, 0, 0, 0]
        """
        #print("before")
        #print(self.dat[nodeId])
        self.dat[nodeId] = a
        if a == self.targetChar:
            self.dat[nodeId] = [a, 1, 1, 1, 1, 0]
        else:
            self.dat[nodeId] = [a, 0, 0, 0, 0, 0]
        #print("after")
        #print(self.dat[nodeId])


        while nodeId != 0:
            nodeId = (nodeId - 1) // 2
            #print(" next nodeId: {0}".format(nodeId))
            # sum : self.dat[nodeId] = self.dat[nodeId * 2 + 1] + self.dat[nodeId * 2 + 2]
            self.dat[nodeId] = self.funcSegmentValueById(nodeId)

    def querySub(self, a, b, nodeId, l, r):
        """
        [a,b)To node node for the parent query of the interval[l, r)Delegate the search for
For intervals, the subscript of data is 0,1,2,3,When it is 4,
        [0,3)Then 0,1,Returns the result of 2
Condition to be None: r <= a or b <= l
        """
        #print("querySub: a={0}, b={1}, nodeId={2}, l={3}, r={4}".format(a, b, nodeId, l, r))

        if (r <= a or b <= l):
            cannotAnswer = (r - l)
            #print(" > None") #This should return an unanswerable number
            return [None, 0, 0, 0, 0, cannotAnswer]
        if a <= l and r <= b:
            #print(" > have: {0} [node = {1}]".format(self.dat[nodeId], nodeId))
            #print(" >     : a={0} <= l={1} and r{2} <= b{3}".format(a,l,r,b))
            res =  self.dat[nodeId]
            return res

        #print("querySubcalc: a={0}, b={1}, nodeId={2}, l={3}, r={4}".format(a, b, nodeId, l, r))
        resLeft = self.querySub(a, b, 2 * nodeId + 1, l, (l + r) // 2)
        resRight = self.querySub(a, b, 2 * nodeId + 2, (l + r) // 2, r)
        #print("querySubend: a={0}, b={1}, nodeId={2}, l={3}, r={4}".format(a, b, nodeId, l, r))
        #print(" > L")
        #print("  node{0}: {1}".format(2 * nodeId + 1, resLeft))
        #print(" > R")
        #print("  node{0}: {1}".format(2 * nodeId + 2, resRight))
        #print(resRight)
        res = self.funcSegmentValue(resLeft, resRight, nodeId)
        #print(" > res")
        #print(res)
        return res

    def query(self, a, b):
        return self.querySub(a, b, 0, 0, self.lenTreeList)

    def debugGetSliceStr(self, a, b):
        """
Of the original string list[a:b]return it: str
        """
        return "".join(list(map(lambda x: x[0], self.dat[self.lenTreeList - 1 + a:self.lenTreeList - 1 + b])))


from pprint import pprint



def test1(a,b):
    pprint(st.query(a, b))
    pprint(st.debugGetSliceStr(a, b))


l = list("xaazaaa")
l = list("xaaaaa0A")
l = list("abaaabcaaaaxasaaa")
st = segmentTreeCharCount()
st.load(l, "a")
st.build()
#st.setValue(2, "x")
#st.setValue(4, "x")
#print("-----")
#pprint(st.dat)

print("----------------------------")
test1(0,9)
st.setValue(1, "a")
test1(0,9)
st.setValue(0, "x")
st.setValue(1, "x")
st.setValue(2, "x")
st.setValue(3, "x")
st.setValue(8, "x")
test1(0,9)

Recommended Posts

Get the longest character length contained in an interval in a segment tree
Get the formula in an excel file as a string in Python
[Python] Get the files in a folder with Python
In the Chainer tutorial, I get an error when importing a package. (mock)
Get the caller of a function in Python
Get only the subclass elements in a list
Create a filter to get an Access Token in the Graph API (Flask)
Get the nth smallest number from the array with O (logN) using a segment tree
Get the variable name of the variable as a character string.
Get the file name in a folder using glob
Get the number of specific elements in a python list
How to get the last (last) value in a list in Python
How to get all the possible values in a regular expression
Delete a particular character in Python if it is the last
How to get the vertex coordinates of a feature in ArcPy
Create a function to get the contents of the database in Go
Take a look at the built-in exception tree structure in Python 3.8.2
[Golang] Check if a specific character string is included in the character string
Get the number of readers of a treatise on Mendeley in Python
I get an error when I put a Python plugin in Visual Studio Code under the pyenv environment