The performance tuning work that prepares the print statement and outputs the execution time is painful, so I'm talking about stopping it.
Improvements are easy if the program can identify slow-running logic. If you use profiler, you can easily identify the cause, so I will show you how to use it. The first half is a method of identifying slow execution logic using line_profiler, and the second half is a speed-up technique in Python.
Use profiler in your local environment to identify which line is heavy. There are various profilers in Python, but I personally use line_profiler because it has the necessary and sufficient functions. What we are specifying here is that "which line has been executed N times, and the total execution time is M%."
I wrote a sample code that takes about 10 seconds to execute. Please read the process of time.sleep () as DB access. It is a program that returns the data that the user has 1000 cards and has 3 skills for each card with json.
■ The profiler will tell you everything, so you can skip the code
sample1.py
# -*- coding: utf-8 -*-
from __future__ import absolute_import, unicode_literals
import random
import time
import simplejson
class UserCardSkill(object):
def __init__(self, user_id, card_id):
self.id = random.randint(1, 1000), #SkillID range is 1-Assuming 999
self.user_id = user_id
self.card_id = card_id
@property
def name(self):
return "skill:{}".format(str(self.id))
@classmethod
def get_by_card(cls, user_id, card_id):
time.sleep(0.01)
return [cls(user_id, card_id) for x in xrange(3)] #Card has 3 skills
def to_dict(self):
return {
"name": self.name,
"skill_id": self.id,
"card_id": self.card_id,
}
class UserCard(object):
def __init__(self, user_id):
self.id = random.randint(1, 300) #Card ID range is 1-Assuming 299
self.user_id = user_id
@property
def name(self):
return "CARD:{}".format(str(self.id))
@property
def skills(self):
return UserCardSkill.get_by_card(self.user_id, self.id)
@classmethod
def get_by_user(cls, user_id):
time.sleep(0.03)
return [cls(user_id) for x in range(1000)] #Assuming the user has 1000 Cards
def to_dict(self):
"""
Convert card information to dict and return
"""
return {
"name": self.name,
"skills": [skill.to_dict() for skill in self.skills],
}
def main(user_id):
"""
Respond the card information possessed by user with json
"""
cards = UserCard.get_by_user(user_id)
result = {
"cards": [card.to_dict() for card in cards]
}
json = simplejson.dumps(result)
return json
user_id = "A0001"
main(user_id)
Now let's install the profiler tool and identify the heavy spots.
install
pip install line_profiler
sample1_profiler.py
~~abridgement~~
#Profiler instantiation and function registration
from line_profiler import LineProfiler
profiler = LineProfiler()
profiler.add_module(UserCard)
profiler.add_module(UserCardSkill)
profiler.add_function(main)
#Execution of the registered main function
user_id = "A0001"
profiler.runcall(main, user_id)
#Result display
profiler.print_stats()
Execution result
>>>python ./sample1_profiler.py
Timer unit: 1e-06 s
Total time: 0.102145 s
File: ./sample1_profiler.py
Function: __init__ at line 9
Line # Hits Time Per Hit % Time Line Contents
==============================================================
9 def __init__(self, user_id, card_id):
10 3000 92247 30.7 90.3 self.id = random.randint(1, 1000), #SkillID range is 1-Assuming 999
11 3000 5806 1.9 5.7 self.user_id = user_id
12 3000 4092 1.4 4.0 self.card_id = card_id
Total time: 0.085992 s
File: ./sample1_profiler.py
Function: to_dict at line 23
Line # Hits Time Per Hit % Time Line Contents
==============================================================
23 def to_dict(self):
24 3000 10026 3.3 11.7 return {
25 3000 66067 22.0 76.8 "name": self.name,
26 3000 6091 2.0 7.1 "skill_id": self.id,
27 3000 3808 1.3 4.4 "card_id": self.card_id,
28 }
Total time: 0.007384 s
File: ./sample1_profiler.py
Function: __init__ at line 32
Line # Hits Time Per Hit % Time Line Contents
==============================================================
32 def __init__(self, user_id):
33 1000 6719 6.7 91.0 self.id = random.randint(1, 300) #Card ID range is 1-Assuming 299
34 1000 665 0.7 9.0 self.user_id = user_id
Total time: 11.0361 s
File: ./sample1_profiler.py
Function: to_dict at line 49
Line # Hits Time Per Hit % Time Line Contents
==============================================================
49 def to_dict(self):
50 """
51 Convert card information to dict and return
52 """
53 1000 1367 1.4 0.0 return {
54 1000 10362 10.4 0.1 "name": self.name,
55 4000 11024403 2756.1 99.9 "skills": [skill.to_dict() for skill in self.skills],
56 }
Total time: 11.1061 s
File: ./sample1_profiler.py
Function: main at line 59
Line # Hits Time Per Hit % Time Line Contents
==============================================================
59 def main(user_id):
60 """
61 Respond with json the card information possessed by user
62 """
63 1 41318 41318.0 0.4 cards = UserCard.get_by_user(user_id)
64 1 1 1.0 0.0 result = {
65 1001 11049561 11038.5 99.5 "cards": [card.to_dict() for card in cards]
66 }
67 1 15258 15258.0 0.1 json = simplejson.dumps(result)
68 1 2 2.0 0.0 return json
■ I was able to identify a heavy line with profiler. From the execution result of line_profiler, it was found that the processing of lines 65 and 55 was heavy. It seems that the user has 1000 cards, and as a result of inquiring UserCardSkill 1000 times for each card, it took more than 10 seconds to execute.
This is a specific program execution speed improvement technique. We will tune the program investigated by profiler by memoization by Cache and Hash search without changing the code structure as much as possible. I want to talk about Python, so I won't talk about SQL acceleration.
Reduce the number of inquiries to UserCardSkill without changing the structure of the code as much as possible. It is a code that acquires UserCardSkill associated with user in a batch, saves it in memory, and returns the value from the data in memory from the second time onward.
sample1_memoize.py
class UserCardSkill(object):
_USER_CACHE = {}
@classmethod
def get_by_card(cls, user_id, card_id):
#Function to access DB each time before improvement
time.sleep(0.01)
return [cls(user_id, card_id) for x in xrange(3)]
@classmethod
def get_by_card_from_cache(cls, user_id, card_id):
#Function to access DB only for the first time after improvement
if user_id not in cls._USER_CACHE:
#If there is no data in the cache, get all skills related to User from DB
cls._USER_CACHE[user_id] = cls.get_all_by_user(user_id)
r = []
for skill in cls._USER_CACHE[user_id]:
if skill.card_id == card_id:
r.append(skill)
return r
@classmethod
def get_all_by_user(cls, user_id):
#Acquire all skills related to User from DB at once
return list(cls.objects.filter(user_id=user_id))
from timeit import timeit
@timeit #Execution time is printed
def main(user_id):
Execution result
>>>sample1_memoize.py
func:'main' args:[(u'A0001',), {}] took: 0.6718 sec
It is more than 15 times faster than 11.1061 sec before the improvement to 0.6718 sec. The reason for the improvement in execution speed is that the number of inquiries to UserCardSkill has been reduced from 1000 to 1.
In the memoized code, the list cls._USER_CACHE [user_id]
with 3 * 1000 elements is linearly searched (full scan) every time in order to linearize the skill for each card in the get_by_card_from_cache function
. Since it is inefficient to search every line type, generate a dict with card_id as the key in advance and rewrite it to hash search. In this code, the complexity of the linear search is O (n) and the complexity of the hash search is O (1).
python
~~abridgement~~
class UserCardSkill(object):
_USER_CACHE = {}
@classmethod
def get_by_card_from_cache(cls, user_id, card_id):
if user_id not in cls._USER_CACHE:
#If there is no data in the cache, get all skills related to User from DB
users_skill = cls.get_all_by_user(user_id)
# card_Convert to dict with id as KEY
cardskill_dict = defaultdict(list)
for skill in users_skill:
cardskill_dict[skill.card_id].append(skill)
#Save to cache
cls._USER_CACHE[user_id] = cardskill_dict
#Rewritten from linear search to hash search
return cls._USER_CACHE[user_id].get(card_id)
@classmethod
def get_all_by_user(cls, user_id):
#Acquire all skills related to User from DB
return list(cls.objects.filter(user_id=user_id))
Execution result
>>>sample1_hash.py
func:'main' args:[(u'A0001',), {}] took: 0.3840 sec
Before the improvement, the list of 3000 elements was fully scanned for 1000 cards, so ʻif skill.card_id == card_id:` was called 3 million times. Since it disappeared by replacing it with hash search, even if the cost of generating hash is deducted, it leads to improvement of execution speed.
お手軽なメモ化といえばcached_property
ではないでしょうか。インスタンスキャッシュにself.func.__name__
(サンプル実装であれば"skills")をKEYにして戻り値を保存しています。2回目以降の問い合わせではキャッシュから値を返却することで実行速度が改善します。実装は数行なのでコード読んだ方が早いかもしれません。cached_property.py#L12
cached_property.py
from cached_property import cached_property
class Card(object):
@cached_property
def skills(self):
return UserCardSkill.get_by_card(self.user_id, self.id)
@timeit
def main(user_id):
cards = Card.get_by_user(user_id)
for x in xrange(10):
cards[0].skills
Execution result
# cached_Before property application
>>>python ./cached_property.py
func:'main' args:[(u'A0001',), {}] took: 0.1443 sec
# cached_After applying property
>>> python ./sample1_cp.py
func:'main' args:[(u'A0001',), {}] took: 0.0451 sec
It is a story on the assumption that the web server is running on wsgi and Apache.
Thread-local storage (TLS) is a means of allocating a place to store unique data for each thread in a given multithreaded process. If you are running a web server with wsgi and Apache and you set MaxRequestsPerChild
to a value greater than or equal to 1 in config, the child process will terminate after MaxRequestsPerChild
requests. If you write a program that uses thread local storage (TLS), you can save the cache for each child process. By storing common data such as master data among all users in TLS, a significant speedup can be expected.
I wrote a program that calculates prime numbers from integers in the range 0-500010. By recording the prime number calculation result in TLS, the second and subsequent prime number calculations are skipped.
tls.py
# -*- coding: utf-8 -*-
from __future__ import absolute_import, unicode_literals
import random
import threading
import time
threadLocal = threading.local()
def timeit(f):
def timed(*args, **kw):
# http://stackoverflow.com/questions/1622943/timeit-versus-timing-decorator
ts = time.time()
result = f(*args, **kw)
te = time.time()
print 'func:%r args:[%r, %r] took: %2.4f sec' % (f.__name__, args, kw, te-ts)
return result
return timed
@timeit
def worker():
initialized = getattr(threadLocal, 'initialized', None)
if initialized is None:
print "init start"
#TLS initialization
threadLocal.initialized = True
threadLocal.count = 0
threadLocal.prime = {}
return []
else:
print "loop:{}".format(threadLocal.count)
threadLocal.count += 1
return get_prime(random.randint(500000, 500010))
def get_prime(N):
"""
Return a list of prime numbers
:param N: int
:rtype : list of int
"""
#If there is data in TLS, return it from the cache
if N in threadLocal.prime:
return threadLocal.prime[N]
#Calculate prime numbers
table = list(range(N))
for i in range(2, int(N ** 0.5) + 1):
if table[i]:
for mult in range(i ** 2, N, i):
table[mult] = False
result = [p for p in table if p][1:]
#Record results in TLS
threadLocal.prime[N] = result
return result
for x in xrange(100):
worker()
Execution result
>>>python tls.py
init start
func:'worker' args:[(), {}] took: 0.0000 sec
loop:0
func:'worker' args:[(), {}] took: 0.1715 sec
loop:1
func:'worker' args:[(), {}] took: 0.1862 sec
loop:2
func:'worker' args:[(), {}] took: 0.0000 sec
loop:3
func:'worker' args:[(), {}] took: 0.2403 sec
loop:4
func:'worker' args:[(), {}] took: 0.2669 sec
loop:5
func:'worker' args:[(), {}] took: 0.0001 sec
loop:6
func:'worker' args:[(), {}] took: 0.3130 sec
loop:7
func:'worker' args:[(), {}] took: 0.3456 sec
loop:8
func:'worker' args:[(), {}] took: 0.3224 sec
loop:9
func:'worker' args:[(), {}] took: 0.3208 sec
loop:10
func:'worker' args:[(), {}] took: 0.3196 sec
loop:11
func:'worker' args:[(), {}] took: 0.3282 sec
loop:12
func:'worker' args:[(), {}] took: 0.3257 sec
loop:13
func:'worker' args:[(), {}] took: 0.0000 sec
loop:14
func:'worker' args:[(), {}] took: 0.0000 sec
loop:15
func:'worker' args:[(), {}] took: 0.0000 sec
...
The cache stored in thread local storage (TLS) is saved for each Apache child process and remains until the child process terminates.
Proper use of the cache will improve the execution speed of the program. However, be aware that there are many cache-specific bugs called side effects. When I saw or did something in the past
■ Display bug that new value cannot be obtained even if updated This is a bug that occurs when you use the cache without being aware of the life cycle design.
■ Bug that data disappears It's a deadly guy. 1. Get value >> 2. In a program that adds to the acquired value and updates the value, the result that the value of 1 is not updated because it refers to the cache, for example, 1234 + 100, 1234 + 200, 1234 At +50, there is a bug that the value disappears.
■ How to prevent side effects
Anyone can safely use it by packaging it like a cached_property
decorator and working with the cache from a well-tested package. You will be able to handle it without knowing the reasoning, but if possible, it is better to know the reasoning about the cache life cycle.
memo The release date of line_profiler is 2008
Recommended Posts