[PYTHON] Story of speed comparison of sort of numerical value and character string (unfinished))

I didn't have the number of collaborations that can be tied to the user with the Qiita API, so I made Utility by myself and tried to implement the ranking output. is.

Since the acquisition result is returned as a dictionary of user name and number of collaborations, and the number of collaborations is also treated as characters, the built-in function sorted does not work as expected.

sort.py


import GetQiitaContribution

d = GetQiitaContribution.getContributionByFile("account.txt")

for v in sorted(d.values()):
    print  (v)
python3 ./sort.py
0
142
23
235
309
4
42

I'm studying Python here, so I tried various things.

First, fix the utility

As the value of the dictionary, I modified it so that the number of contributions is stored as a numerical value, and tried to sort as expected.

python


#            dict[username] = getContributionByName(username)
            dict[username] = int(getContributionByName(username))
time python3 ./sort.py
0
4
23
42
142
235
309

real	0m4.177s
user	0m0.101s
sys	0m0.026s

real	0m3.636s
user	0m0.111s
sys	0m0.028s

real	0m3.377s
user	0m0.108s
sys	0m0.029s

Here, I compared the original string storage version with the time it takes to execute.

time python3 ./sort.py
0
142
23
235
309
4
42

real	0m4.298s
user	0m0.115s
sys	0m0.045s

real	0m3.535s
user	0m0.100s
sys	0m0.025s

real	0m3.606s
user	0m0.100s
sys	0m0.025s

I'm worried about the speed difference between sorting numbers and sorting strings in Python

I was wondering how much the speed difference would be between character sorting and numerical sorting, but it was not a speed comparison that cut out only the sorting (I can not say anything because it is not profiled, but it takes processing Most of the time seems to be spent on scraping done by the utility), and the amount of data being sorted is not a big volume, so it is not helpful.

So I decided to make a full-scale comparison. I made a list of numbers and a list of strings from 1 to 1048576 (maximum number of lines in Excel for Mac) and sorted each one.

Numerical value

real	0m12.400s
user	0m9.455s
sys	0m2.412s

real	0m12.522s
user	0m9.444s
sys	0m2.493s

real	0m12.500s
user	0m9.718s
sys	0m2.288s

String

real	0m14.583s
user	0m9.772s
sys	0m3.828s

real	0m12.000s
user	0m9.259s
sys	0m2.200s

real	0m12.183s
user	0m9.430s
sys	0m2.249s

Apparently, there is not much speed difference in sorting numbers and strings when the number of elements is about 1 million.

Conclusion

It seems that the speed does not come out in the order of 7 digits, whether it is a character string or a numerical value. LL has become widespread and stands out a lot, but I'm a little surprised that I'm becoming a rotor programmer, whether it's the evolution of hardware or the evolution of LL, but I'd like to end this loose article here.

Thank you to those who have read this far.

Recommended Posts

Story of speed comparison of sort of numerical value and character string (unfinished))
Speed comparison of murmurhash3, md5 and sha1
[Django 2.2] Sort and get the value of the relation destination
Date and time ⇔ character string
Differentiation of sort and generalization of sort
Comparison of Apex and Lamvery
Speed comparison of Wiktionary full text processing with F # and Python
Speed comparison between CPython and PyPy
Speed comparison of Python XML parsing
I replaced the numerical calculation of Python with Rust and compared the speed
[Python] Summary of conversion between character strings and numerical values (ascii code)