Python's lru_cache was slow (I investigated it because I misunderstood it)

TL;DR It's not late Note that if you don't set maxsize for lru_cache, it will default to 128. Reference

When I used functools.lru_cache in dynamic programming, it was TLE

Implementation

In D of AtCoder Beginner Contest184, the problem of DP was asked.

from functools import lru_cache

def solve(n1: int, n2: int, n3: int) -> float:
    @lru_cache
    def expect(a: int, b: int, c: int) -> float:
        if 100 in (a, b, c):
            return 0

        return (a/(a+b+c)) * (expect(a+1, b, c) + 1) \
                + (b/(a+b+c)) * (expect(a, b+1, c) + 1) \
                + (c/(a+b+c)) * (expect(a, b, c+1) + 1)

    return expect(n1, n2, n3)

if __name__ == '__main__':
    a, b, c = map(int, input().split())
    print(solve(a, b, c))

If you implement it in such an atmosphere, TLE ...

def solve(n1: int, n2: int, n3: int) -> float:
    memo = [[[-1]*101]*101]*101
    def expect(a: int, b: int, c: int) -> float:
        if memo[a][b][c] >= 0:
            return memo[a][b][c]
        elif a == 100 or b == 100 or c == 100:
            memo[a][b][c] = 0
        else:
            memo[a][b] = (a/(a+b+c)) * (expect(a+1, b, c) + 1) \
                + (b/(a+b+c)) * (expect(a, b+1, c) + 1) \
                + (c/(a+b+c)) * (expect(a, b, c+1) + 1)
        return memo[a][b]

    return expect(n1, n2, n3)

if __name__ == '__main__':
    a, b, c = map(int, input().split())
    print(solve(a, b, c))

I got it with this. why?

Take a look at functools.lru_cache

The implementation is here [https://github.com/python/cpython/blob/master/Lib/functools.py#L479).

I am trying to support multi-threaded processing, but the specific processing is as follows.

Information held

The wrapper for lru_cache stores the following states:

root example

For example, suppose that the Fibonacci sequence is called as follows.

from functools import lru_cache

@lru_cache
def fib(n):
    if n == 1 or n == 0:
        return 1

    return fib(n-1) + fib(n-2)

print(fib(3))

At this time, the order of returning the results is fib (2)-> fib (1)-> fib (3). At this time, root is

{
  "PREV": {
    "PREV": {
      "PREV": {
        "PREV": self,
        "NEXT": self,
        "KEY": 2,
        "RESULT": 1
      },
      "NEXT": self,
      "KEY": 1,
      "RESULT": 1
    },
    "NEXT": self,
    "KEY": 3,
    "RESULT": 2
  },
  "NEXT": {
    "PREV": self,
    "NEXT": {
      "PREV": self,
      "NEXT": {
        "PREV": self,
        "NEXT": self,
        "KEY": 3,
        "RESULT": 2
      },
      "KEY": 1,
      "RESULT": 1
    },
    "KEY": 2,
    "RESULT": 1
  },
  "KEY": null,
  "RESULT": null
}

I wrote it like JSON, but it's actually a list type. Here, writing self means that the pointer of root itself is stored. I didn't know how to write it, so I wrote it like this. Looking at PREV, it is 3-> 1-> 2 in order from the outside, and looking at NEXT, it is 2-> 1-> 3 from the outside. This order changes when the function is called with cached arguments. See the source code for a detailed implementation. What we are doing here is saving the order in which they were called.

Behavior of maxsize

First, maxsize is the number of records to cache in lru_cache. If this is set, every time misses is counted, it checkslen (cache)to see if it exceeds maxsize. Every time it becomes full, it deletes the oldest information from the cache with root.

Setting maxsize

So far, it is actually the behavior when maxsize is set. By default, it defaults to maxsize = 128 without any special settings. If you set maxsize = None first, the behavior will be completely different. It is easy to implement because it is no longer necessary to consider the order. Since there are hits and misses, you can see this information by doing something like fib.cache_info (), but there is no such thing as root or full.

What went wrong this time

It seems that the number of caches was insufficient because it was set to maxsize = 128 because the maxsize was not set. After a lot of research, specifying maxsize = None did not result in TLE. In addition, the speed has improved slightly (several tens of ms) compared to when the cache was set to memo [a] [b] [c].

So let's specify maxsize = None when we don't need it. (I should have read the documentation a little more ...)

Recommended Posts

Python's lru_cache was slow (I investigated it because I misunderstood it)
When I investigated whether the COTOHA API could understand comics, it was reasonable.
[Python] None I made nullutil.py because it was cluttered by checking and branching.
[Introduction to json] No, I was addicted to it. .. .. ♬