[PYTHON] A programming beginner tried to find out the execution time of sorting etc.

Overview

Read Math Girl Randomized Algorithm Last time (http://qiita.com/RyuSA/items/0be781f38fa7fc38f69d) Timed some sorts → I want a more solid distribution → Do you want to measure it?

Execution environment

DELL venue 11pro 5300 OS : Windows10 CPU : Atom Z3770 1.46GHz Memory : 2GB Language : Python 2.7

Search algorithm

Measurement method

Repeat the following procedure 10000 times and compare the times.

  1. Randomly generate 100,000 numbers from 1 to 100000 with numpy.random.randint (1,100000,100000)
  2. Generate one search target with random.randint (1,100000)
  3. Run each search algorithm (start measurement from here)
  4. When True / False returns, the measurement ends and the measured time is saved in a file.
  5. Return to 1.

Linear search

The simplest search algorithm. Just check in order from the front. The source code is below

import numpy as np

def LINER_SERCH(A,n,v):
    k = 1
    while  k < n:
        if A[k] == v:
            return True
        k += 1
    return False

Measurement result

Below is the frequency distribution of the measurement results image002.png There are many around 0.09sec to 0.10sec. It seems that False is returned here.

Linear search with sentinel

Just add the search target to the end of the given sequence. It's faster than just a linear search because it reduces the number of decisions in the While Loop. The source code is below

import numpy as np

def SENTINEL_LINER_SERCH(A,n,v):
    A = np.append(A,v)
    k = 0
    while  A[k] != v:
        k += 1
    if k < n:
        return True
    return False

Measurement result

Below is the frequency distribution of the measurement results image004.png There are many around 0.08sec to 0.085sec. It seems that False is returned here. In the previous linear search and linear search with sentinel, you can see that the latter is a little faster.

Binary search method

How to compare the element in the middle of a given sequence (sorted) with the search target and reduce the search range according to the size. Only in this case, the operation of sorting the given sequence A is performed in advance. The source code is below

import numpy as np
import math
def BINARY_SEARCH(A,n,v):
    a = 0
    b = n-1
    A.sorted()
    while  a <= b :
        k = int(math.floor(b + (a-b)/2))
        if A[k] == v:
            return True
        elif A[k] < v:
            a = k+1
        else:
            b = k-1
    return False

Measurement result

Below is the frequency distribution of the measurement results image002.png There are many around 0.015sec to 0.017sec. Compared to the previous two searches, you can see that the algorithm is overwhelmingly faster even though it includes sorting first.

Sort algorithm

Measurement method

Repeat the following procedure 10000 times and compare the times.

  1. Randomly generate 1000 numbers from 1 to 1000 with numpy.random.randint (1,1000,1000)
  2. Run each sort algorithm (measurement starts from here)
  3. When the sequence is returned, the measurement is completed and the measured time is saved in a file.
  4. Return to 1.

Bubble sort

A method of repeating comparison and exchange with the neighbor in order from the first one of the given sequence. The source code is below

import numpy as np

def BUBBLE_SORT(A):
    n = np.size(A)
    m = n-1
    while  m > 1:
        k = 0
        while k < m :
            if A[k] > A[k+1]:
                A[k],A[k+1] = A[k+1],A[k]
            k += 1
        m -= 1
    return A

Measurement result

Below is the frequency distribution of the measurement results image006.png 0.7sec is the mode. Very slow ...

Quick sort

The first sequence of a given sequence is taken as the pivot, and it is divided into two sequences that are larger / smaller than the pivot. All you have to do is repeat the same operation. However, if you pass a sorted sequence or a similar sequence, it will be very slow. The source code is below

import numpy as np
def QUICK_SORT(A,L,R):
    if L < R:
        p = L
        k = L+1
        while k <= R:
            if A[k] < A[L]:
                A[p+1], A[k] = A[k], A[p+1]
                p += 1
            k += 1
        A[L], A[p] = A[p], A[L]
        A = QUICK_SORT(A,L,p-1)
        A = QUICK_SORT(A,p+1,R)
    return A

Measurement result

Below is the frequency distribution of the measurement results image008.png 0.17sec is the mode. Compared to bubble sort, the speed was about 75% faster. Especially this time, since the sequence to be passed is generated by pseudo-random numbers, I wondered if the execution time was relatively stable thanks to the moderate randomness in the sequence. .. ..

Randomized quicksort

Randomly get the pivoting method instead of fixing it to the left side of the sequence. This way you can (should) perform quicksort without depending on the randomness of the sequence (probability distribution of each element). The source code is below

import numpy as np
def RANDOMIZED_QUICK_SORT(A,L,R):
    if L < R:
        r = np.random.randint(L,R)
        A[L], A[r] = A[r], A[L]
        p = L
        k = L+1
        while k <= R:
            if A[k] < A[L]:
                A[p+1], A[k] = A[k], A[p+1]
                p += 1
            k += 1
        A[L], A[p] = A[p], A[L]
        A = RANDOMIZED_QUICK_SORT(A,L,p-1)
        A = RANDOMIZED_QUICK_SORT(A,p+1,R)
    return A

Measurement result

Below is the frequency distribution of the measurement results image004.png …… (゚ Д ゚) Poker Well, for some reason it became very slow when I made it random. .. .. And so I put np.random.randint (L, R) into the original Quick Sort in vain and remeasured it. image002.png This is also late! (゚ Д ゚) Apparently, np.random.randint is a very heavy process compared to other processes. .. .. It seems that it can be used as a quicksort that does not depend on a given sequence while maintaining the same execution speed (except for random number generation) even if it is randomized.

in conclusion

When the execution time of the search / sort algorithm was actually measured and aggregated, the results were almost as theoretical. On the other hand, if you make random numbers, it will take time to generate random numbers, and you can also know interesting facts such as the original algorithm is faster (laugh)

References

Mathematics Girl Random Algorithm Author: Hiroshi Yuki Programming beginners compared sort times

Recommended Posts

A programming beginner tried to find out the execution time of sorting etc.
The first time a programming beginner tried simple data analysis by programming
[Python3] Define a decorator to measure the execution time of a function
A beginner who has been programming for 2 months tried to analyze the real GDP of Japan in time series with the SARIMA model.
I tried to find the entropy of the image with python
I tried to find the average of the sequence with TensorFlow
How to find the scaling factor of a biorthogonal wavelet
[Python] Programming to find the number of a in a character string that repeats a specified number of times.
python beginners tried to find out
I want to record the execution time and keep a log.
I tried to make a regular expression of "time" using Python
How to find the memory address of a Pandas dataframe value
I tried to cut out a still image from the video
[Python] A simple function to find the center coordinates of a circle
I tried to find out the difference between A + = B and A = A + B in Python, so make a note
I tried to sort out the objects from the image of the steak set meal-② Overlap number sorting
About the order of learning programming languages (from beginner to intermediate) Part 2
I tried to verify the best way to find a good marriage partner
How to find out the number of CPUs without using the sar command
A super beginner who does not know the basics of Python tried to graph the stock price of GAFA
I tried to find the optimal path of the dreamland by (quantum) annealing
A beginner of machine learning tried to predict Arima Kinen with python
I tried to display the altitude value of DTM in a graph
I tried to verify the result of A / B test by chi-square test
Python: I want to measure the processing time of a function neatly
I wanted to use the find module of Ansible2, but it took some time, so make a note
How to pass the execution result of a shell command in a list in Python
[Circuit x Python] How to find the transfer function of a circuit using Lcapy
Find out how to divide a file with a certain number of lines evenly
A record of the time it took to deploy mysql on Cloud9 + Rails
I want to store the result of% time, %% time, etc. in an object (variable)
I tried to refactor the code of Python beginner (junior high school student)
I tried to create a model with the sample of Amazon SageMaker Autopilot
How to calculate the volatility of a brand
How to find the area of the Voronoi diagram
Combinatorial optimization to find the hand of "Millijan"
Setting to output the log of cron execution
I tried to find 100 million digits of pi
I tried to touch the API of ebay
I tried python programming for the first time.
Find the number of days in a month
I tried to correct the keystone of the image
Find out the day of the week with datetime
From the introduction of pyethapp to the execution of contract
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to find out how to streamline the work flow with Excel x Python ②
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
How to execute a schedule by specifying the Python time zone and execution frequency
The story of Airflow's webserver and DAG, which takes a long time to load
I tried to find out how to streamline the work flow with Excel x Python ④
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
Set the output destination of the execution result to Vim started as a modeless window
I tried to find out how to streamline the work flow with Excel x Python ⑤
A python beginner tried to intern at an IT company [Day 3: Going to the clouds ...]
I tried to put out the frequent word ranking of LINE talk with Python
I can't find the clocksource tsc! ?? The story of trying to write a kernel patch
I tried to sort out the objects from the image of the steak set meal-④ Clustering
I tried to find out how to streamline the work flow with Excel x Python ①
A memorandum of commands, packages, terms, etc. used in linux (updated from time to time)
Find out the maximum number of characters in multi-line text stored in a data frame