[PYTHON] You will be an engineer in 100 days ――Day 60 ――Programming ――About data structure and sorting algorithm

Click here until yesterday

data structure

Data structures are important for learning algorithms. Since the algorithm is a procedure, I said what kind of data and how to manipulate it. The structure of the data is also an important item in the algorithm.

list

Python list type is an array type data type using an index You can access them in order.

List type can store various data as elements You can add, change, replace, and delete elements.

#List definition
l = [1,2,3,4,5]
print(l)

#Element reference
print(l[0])

#Change elements
l[1] = 8
print(l)

#Add element
l.append(6)
print(l)

#Delete element
l.pop(3)
print(l)

#Insert element
l.insert(4,9)
print(l)

#Swap elements
l[0], l[3] = l[3], l[0]
print(l)

[1, 2, 3, 4, 5] 1 [1, 8, 3, 4, 5] [1, 8, 3, 4, 5, 6] [1, 8, 3, 5, 6] [1, 8, 3, 5, 9, 6] [5, 8, 3, 1, 9, 6]

Stacks and queues

Stack and queue are also one of the ways to handle data.

stack -Data can be added from the beginning of the column, and data can be retrieved from the beginning of the column. ・ Last-in-first-out LIFO (Last-In-First-Out) -Putting data on the stack is called push, and taking it out is called pop.

queue -Add data from the end of the column, retrieve data from the beginning of the column ・ First-in first-out (Tokoroten method) FIFO (First-In-First-Out) -Putting data in a queue is called enqueue, and retrieving it is called dequeue.

Reference: wikipedia

In Python, it can be realized by list type or ç.

stack

stack = []

#push
stack.append(1)
stack.append(2)
stack.append(3)
print(stack)

#pop
stack.pop()
print(stack)
stack.pop()
print(stack)

[1, 2, 3] [1, 2] [1]

queue

from collections import deque

#Queue definition
queue = deque(["A","B","C"])
print(queue)

#Enqueue
queue.append("D")
print(queue)

#Decue
queue.popleft()
print(queue)

deque(['A', 'B', 'C']) deque(['A', 'B', 'C', 'D']) deque(['A', 'B', 'C'])

Heap

What is a heap? A child element is always larger or equal (or always smaller or equal) than a parent element. It is a tree-structured data with the constraint.

When we simply say heap, we often refer to the binary heap using a binary tree.

Reference: wikipedia

The heap can be implemented in Python with the heapq library. You can quickly retrieve the minimum (or maximum) value in the heap.

import heapq

#Create a list
heap = [1, 6, 8, 0, -1]
print(heap)

#Heap an existing list(minimum value)
heapq.heapify(heap)
print(heap)

#Add element
heapq.heappush(heap, 10)
print(heap)

#Remove minimum value
print(heapq.heappop(heap))
print(heap)

max_heap = [3,7,5,9,-2]
print(max_heap)
#Create heap with maximum value
heapq._heapify_max(max_heap) 
print(max_heap)

#Remove maximum value
print(heapq._heappop_max(max_heap))
print(max_heap)

[1, 6, 8, 0, -1] [-1, 0, 8, 1, 6] [-1, 0, 8, 1, 6, 10] -1 [0, 1, 8, 10, 6] [3, 7, 5, 9, -2] [9, 7, 5, 3, -2] 9 [7, 3, 5, -2]

About sorting algorithm

The sort algorithm is an algorithm that sorts. There are many variations to simply sort.

Stable stable sorting is when there are multiple records with the same key in the data After sorting, the order of the records before sorting is maintained.

Sorting so that the positional relationship before sorting is broken by sorting It is called an unstable ʻunstable` sort.

It is desirable to adopt a sorting algorithm that finishes sorting quickly and is stable.

In the sorting algorithm, the amount of calculation is also a guide. The amount of calculation of the sort algorithm that is generally said is as follows.

algorithm	Stability	Average time	Worst time	Best time	Worst space
Bubble sort	Stable	\mathcal{O}(n^2)	\mathcal{O}(n^2)	\mathcal{O}(n)	\mathcal{O}(1)
Selection sort	unstable	\mathcal{O}(n^2)	\mathcal{O}(n^2)	\mathcal{O}(n)	\mathcal{O}(n)
Insertion sort	Stable	\mathcal{O}(n^2)	\mathcal{O}(n^2)	\mathcal{O}(n)	\mathcal{O}(n)
Heapsort	unstable	\mathcal{O}(n\log n)	\mathcal{O}(n\log n)	\mathcal{O}(n\log n)	\mathcal{O}(1)
Merge sort	Stable	\mathcal{O}(n\log n)	\mathcal{O}(n\log n)	\mathcal{O}(n\log n)	\mathcal{O}(n)
Quick sort	unstable	\mathcal{O}(n\log n)	\mathcal{O}(n^2)	\mathcal{O}(n)	\mathcal{O}(n)
python sort(Tim sort)	Stable	\mathcal{O}(n\log n)	\mathcal{O}(n\log n)	\mathcal{O}(n)	\mathcal{O}(n)

Now let's look at various sorting algorithms. I have posted an implementation example, but I think there is a better way to write it.

Bubble sort

Bubble sort compares the values of two adjacent elements in a list It is an alignment algorithm that performs exchange according to conditions.

Sort the list in descending order (descending order) or ascending order (ascending order).

#Bubble sort
def bubble_sort(data):
    for i in range(len(data)):
        for j in range(len(data) - i - 1):
            if data[j] > data[j + 1]:
                data[j], data[j + 1] = data[j + 1], data[j]
    return data

Select sort

Selection sort looks for the element with the minimum (maximum) value of the array It is an algorithm that performs alignment by exchanging it with the first element of the array.

#Selection sort
def select_sort(data):
    for i in range(len(data)):
        min = i
        for j in range(i + 1, len(data)):
            if data[min] > data[j]:
                min = j
        data[i], data[min] = data[min], data[i]
    return data

Insert sort

Insert sort is by inserting a new element in the appropriate position for the aligned part of the list. An algorithm for alignment.

#Insertion sort
def insert_sort(data):
    for i in range(1, len(data)):
        temp = data[i]
        j = i - 1
        while (j >= 0) and (data[j] > temp):
            data[j + 1] = data[j]
            j -= 1
        data[j + 1] = temp
    return data

Heap sort

Heapsort is a sorting algorithm that sorts the list using the binary heap tree.

Take elements from the unsorted list and add them to the heap in order. Repeat the following until you add all the elements (take the route and add it to the sorted list)

#Heapsort
def heap_sort(data):
    for i in range(len(data)):
        j = i
        while (j > 0) and (data[(j - 1) // 2] < data[j]):
            data[(j - 1) // 2], data[j] = data[j], data[(j - 1) // 2]
            j = (j - 1) // 2

    for i in range(len(data), 0, -1):
        data[i - 1], data[0] = data[0], data[i - 1]
        j = 0
        while ((2 * j + 1 < i - 1) and (data[j] < data[2 * j + 1]))\
            or ((2 * j + 2 < i - 1) and (data[j] < data[2 * j + 2])):
            if (2 * j + 2 == i - 1) or (data[2 * j + 1] > data[2 * j + 2]):
                data[j], data[2 * j + 1] = data[2 * j + 1], data[j]
                j = 2 * j + 1
            else:
                data[j], data[2 * j + 2] = data[2 * j + 2], data[j]
                j = 2 * j + 2
    return data

In python, using the library heapq that realizes heap queue You can also easily write a heapsort.

import heapq

#Heapsort using heapq
def heap_sort2(data):
    h = data.copy()
    heapq.heapify(h)
    return [heapq.heappop(h) for _ in range(len(h))]

Merge sort

Merge sort recursively divides the array you want to sort and merges again` It is a sorting algorithm that tries to realize sorting.

#Merge sort
def merge_sort(data):
    if len(data) <= 1:
        return data
    m = len(data) // 2
    l = merge_sort(data[:m])
    r = merge_sort(data[m:])
    return merge(l , r)

def merge(left, right):
    res , i , j = [], 0, 0
    while (i < len(left)) and (j < len(right)):
        if left[i] <= right[j]:
            res.append(left[i])
            i += 1
        else:
            res.append(right[j])
            j += 1
    if i < len(left):
        res.extend(left[i:])
    if j < len(right):
        res.extend(right[j:])
    return res

Quick sort

The feature of quicksort is that the number of data comparisons and exchanges is very small. You can efficiently sort data that is scattered randomly, It is an algorithm that is not stable.

def quick_sort(data):
    def quick(list, left, right):
        pl = left
        pr = right
        y = list[(pl + pr) // 2]
        while True:
            while list[pl] < y: pl += 1
            while list[pr] > y: pr -= 1
            if pl <= pr:
                list[pl], list[pr] = list[pr], list[pl]
                pl += 1
                pr -= 1
            if pl > pr:
                break
        if left < pr: quick(list, left, pr)
        if pl < right: quick(list, pl, right)
        return data
    quick(data, 0, len(data) - 1)
    return data

Python standard sort

Python's standard sort is (probably) Timsort.

Timsort is a stable hybrid derived from merge sort and insertion sort The sorting algorithm is designed to work well with different types of data.

def python_sort(data):
    data.sort()
    return data

Compare the execution time of the sort algorithm

Let's measure the execution time of the above sorting algorithm. You can easily measure with the following code.

Sorts 10000 numbers. With jupyter notebook, you can measure the time by using% time.

import numpy as np
import sys
sys.setrecursionlimit(20000)

data = np.arange(10000)
np.random.shuffle(data)
data = list(data)

print('bubble_sort')
%time l = bubble_sort(data)
print('select_sort')
%time l = select_sort(data)
print('insert_sort')
%time l = insert_sort(data)
print('heap_sort')
%time l = heap_sort(data)
print('heap_sort2')
%time l = heap_sort2(data)
print('merge_sort')
%time l = merge_sort(data)
print('quick_sort')
%time l = quick_sort(data)
print('python_sort')
%time l = python_sort(data)

bubble_sort CPU times: user 18 µs, sys: 0 ns, total: 18 µs Wall time: 20 µs select_sort CPU times: user 18 µs, sys: 1 µs, total: 19 µs Wall time: 21 µs insert_sort CPU times: user 7 µs, sys: 0 ns, total: 7 µs Wall time: 10 µs heap_sort CPU times: user 38 µs, sys: 1 µs, total: 39 µs Wall time: 40.1 µs heap_sort2 CPU times: user 11 µs, sys: 0 ns, total: 11 µs Wall time: 12.9 µs merge_sort CPU times: user 31 µs, sys: 1e+03 ns, total: 32 µs Wall time: 33.1 µs quick_sort CPU times: user 13 µs, sys: 1 µs, total: 14 µs Wall time: 14.8 µs python_sort CPU times: user 4 µs, sys: 0 ns, total: 4 µs Wall time: 6.2 µs

Well, the shortest time is Python's standard sort.

I think that this will change considerably depending on how you implement it. I think it's quite difficult to achieve speed that exceeds standard sorting. Normally, you don't need to write your own program for sorting.

Summary

Python sorting is so powerful that you may not usually care about it. There are few problems with sorting.

How is sorting done? If you compare how each method is different You will understand the depth of the algorithm.

40 days until you become an engineer

Author information

Otsu py's HP: http://www.otupy.net/

Youtube： https://www.youtube.com/channel/UCaT7xpeq8n1G_HcJKKSOXMw

Twitter： https://twitter.com/otupython