[PYTHON] Database search (verification of processing speed with or without index)

Purpose

――I want to verify how much the presence or absence of an index makes a difference in processing speed. ――In the process, I want to understand a little about B-trees.

Target audience

--Those who are learning about databases --Students who have begun to learn about CS

Preparation and environment to get started

--The environment used this time is [Colaboratory] provided by Google (https://colab.research.google.com/notebooks/welcome.ipynb?hl=ja) --The language used is python --The data to be used is summarized in csv etc. in advance. --The data used is 1147620 rows of data.

About the code

-My github ――The code content is adapted to the data I used, so rewrite it each time ...

inspection result

Search range from 10000 to 10100

--No index: 0.290917145000094 --Indexed: 4.710936333000063

Search range from 10000 to 10010

--No index: 10.85402692900015 --Indexed: 0.285733380000237

Search range from 10000 to 10001

--No index: 68.63662464900017 --Indexed: 0.263980986000206

From the verification result

It was proved that the presence or absence of the index makes such a difference in the search processing time.

It seems that B-tree algorithms and bitmaps are used, For details

Understanding the "index" that improves database performance

Is written very carefully, so I recommend it.

that's all. .. .. .. .. .. ..

Recommended Posts

Database search (verification of processing speed with or without index)
Database search with db.py
Parallel processing with Parallel of scikit-learn
Speed comparison of Wiktionary full text processing with F # and Python
Consider the speed of processing to shift the image buffer with numpy.ndarray
Grid search of hyperparameters with Scikit-learn
Basics of binarized image processing with Python
Drawing with Matrix-Reinventor of Python Image Processing-
Example of efficient data processing with PANDAS