Implementation of TRIE tree with Python and LOUDS

Overview

Implemented TRIE tree in Python LOUDS is used for the data structure to create the TRIE tree

TRIE tree

--A TRIE tree is a set of character strings represented by an ordered tree structure. --Composed of nodes and edges, and the position and key (search string) of each node correspond. --The empty string corresponds to the root node

TRIE木

Features of TRIE tree

--Fast dictionary search: You can search in a constant time regardless of the size of the tree. --Memory saving: Since the key is not stored in the node, the node can be shared by multiple keys, and memory can be saved. --Efficient prefix search: Since a node is shared by multiple keys, it is possible to efficiently search for child nodes associated with the parent node (prefix).

Due to its characteristics, the TRIE tree is used for kana-kanji conversion and auto-completion.

LOUDS --LOUDS (level order unary degree sequence) is one of the order tree expressions, which can express a tree structure with an extremely small size. --Specifically, when the number of nodes is N, the tree structure is represented by a label array of length N and a bit array of 2N + 1. ――However, in order to search efficiently from the created Tree, auxiliary data (complete dictionary) is required separately.

Complete dictionary

--A complete dictionary is the most basic data structure in a succinct data structure. --By using the complete dictionary as auxiliary data, it is possible to search in a constant time. --Specifically, search the TRIE tree using the operations rank and select. --rank (): How many bits of 1 are from the beginning of the bit string to the position k --select (): Where is the position next to the nth 1-bit when viewed from the beginning of the bit string?
The succinct data structure is explained in detail on this page

Implementation

Implemented the following in Python (GitHub)

--trie.py: TRIE tree --constructor.py: bit array --words.py: Create / read dictionary data --measure.py: Measure memory and search time --search_word.py: Word search --test.py: Measure search time

Creating dictionary data

When executed as follows, dictionary data in which the node numbers and words of the TRIE tree are written separated by commas is created. Data used nltk's wordnet corpus Later, test data will be created from this dictionary data.

from words import CreateWords
CreateWords("./data/origin/wordnet_words.csv")

Word search

 python search_word.py dictionary data PATH

You can do a single word search by running the above file When you enter an arbitrary word, the node number, word definition, and prefix obtained from the search are output as shown below. search_result.PNG

Search time measurement

Creating test data

You can create test data for any number of words from dictionary data by executing the following

 python words.py Dictionary data PATH Number of samples 1, Number of samples 2, Number of samples 3,…

If you want to create multiple test data, specify the data sample size separated by commas. If "Test data is created." Is output, it is ok. Test data is created in ./data/test

Measurement

When executed, the test can be executed any number of times with the test data as shown below.

 python test.py dictionary data PATH test data PATH test count

If "Test is done." Is output, it is ok. Output results are created in ./results

In the measurement, the exact match search time and the prefix search time are measured.

--Exact match search: Total execution time of search function of trie class --Prefix search: Total execution time of get_blow_nodes function of trie class

Internally, the search function of the trie class is executed for the input word, and the node number of the TRIE tree is output. The output node number is collated with the node number of the dictionary data, and if they match, the number of searches is incremented by 1, and it is confirmed whether the search is accurate. The same applies to prefix search

Output result

test_result_detail.PNG

--Exact match search: Total time taken to search for a word --Search result: Number of items that match the node number of the dictionary data --Prefix search: Total time taken to search all prefixes associated with the search word --Prefix search (per item): Search time per prefix associated with the search word

References

-Explanation of succinct data structure LOUDS (12 times in total, with exercises) -Explanation of complete dictionary (concise bit vector) -I wrote a tri-tree that can be used for high-speed dictionary implementation in D language and Python

Recommended Posts

Implementation of TRIE tree with Python and LOUDS
Python implementation of non-recursive Segment Tree
Implementation of Dijkstra's algorithm with python
Coexistence of Python2 and 3 with CircleCI (1.0)
Continuation of multi-platform development with Electron and Python
Explanation of edit distance and implementation in Python
Example of reading and writing CSV with Python
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Easy partial download of mp4 with python and youtube-dl!
[# 2] Make Minecraft with Python. ~ Model drawing and player implementation ~
Visualize the range of interpolation and extrapolation with python
Overview of generalized linear models and implementation in Python
Comparison of CoffeeScript with JavaScript, Python and Ruby grammar
Version control of Node, Ruby and Python with anyenv
Programming with Python and Tkinter
Encryption and decryption with Python
Python and hardware-Using RS232C with Python-
Explanation and implementation of SocialFoceModel
Python implementation of particle filters
python with pyenv and venv
Maxout description and implementation (Python)
Implementation of quicksort in Python
Source installation and installation of Python
Works with Python and R
Build API server for checking the operation of front implementation with python3 and Flask
Python implementation of CSS3 blend mode and talk of color space
Perform isocurrent analysis of open channels with Python and matplotlib
Get rid of dirty data with Python and regular expressions
Detect objects of a specific color and size with Python
[With simple explanation] Scratch implementation of deep Boltzmann machine with Python ②
[With simple explanation] Scratch implementation of deep Boltzmann machine with Python ①
Sample of HTTP GET and JSON parsing with python of pepper
Play with the password mechanism of GitHub Webhook and Python
Communicate with FX-5204PS with Python and PyUSB
Environment construction of python and opencv
The story of Python and the story of NaN
Explanation and implementation of PRML Chapter 4
Robot running with Arduino and python
Introduction and Implementation of JoCoR-Loss (CVPR2020)
Install Python 2.7.9 and Python 3.4.x with pip.
Explanation and implementation of ESIM algorithm
Neural network with OpenCV 3 and Python 3
AM modulation and demodulation with python
Installation of SciPy and matplotlib (Python)
[Python] font family and font with matplotlib
Scraping with Node, Ruby and Python
Introduction and implementation of activation function
Sorting algorithm and implementation in Python
Scraping with Python, Selenium and Chromedriver
Implementation of life game in Python
Getting Started with Python Basics of Python
JSON encoding and decoding with python
Hadoop introduction and MapReduce with Python
[GUI with Python] PyQt5-Drag and drop-
This and that of python properties
Life game with Python! (Conway's Game of Life)
Reading and writing NetCDF with Python
10 functions of "language with battery" python
I played with PyQt5 and Python3
Implementation of desktop notifications using Python
Implementation of Light CNN (Python Keras)