Find the Levenshtein Distance with python

Regarding the editing distance, it is a little old, but the article by Naoya Ito is helpful. Simply put, it's a way to express the closeness of two strings as a number.

Reference: Levenshtein Distance-Naoya's Hatena Diary

Preparation

There was a package called python-Levenshtein, so let's put it in.

$ sudo pip install python-Levenshtein

Try

Let's write a code like this.

#!/usr/bin/env python
# coding: utf8

import Levenshtein

string1 = "Yasuji Inoue"
string2 = "Yasuji Inoue"

string1 = string1.decode('utf-8')
string2 = string2.decode('utf-8')

print Levenshtein.distance(string1, string2)
$ python levenshtein.py
1

Japanese is also OK. If you replace one character, it will be the correct character, so the editing distance will be 1.

I'm not good at typing the letters python, and when I notice it, it becomes pyhton. The editing distance between pyhton and python is 2. (Because it will be the same if you swap the two letters)

bonus

Looking at the Documentation, it seems that you can also calculate the Jaro-Winkler distance and so on.

Bonus 2

If you register it as a MySQL stored like the one below, it will look like ORDER BY LEVENSHTEIN (title, "Hogehoge"). It is convenient because it will be displayed in the order of the letters. However, the index does not work, so if you are searching all records and the number of records is large, the query will be quite heavy.

https://github.com/fza/mysql-doctrine-levenshtein-function

Relation

PHP-[Multibyte support] Find the Levenshtein distance-Qiita

Recommended Posts

Find the Levenshtein Distance with python
Write a python program to find the editing distance [python] [Levenshtein distance]
Find the maximum Python
Find the mood value with python (Rike Koi)
Find the shortest path with the Python Dijkstra's algorithm
Find the difference in Python
Call the API with python3.
Find the maximum python (improved)
I tried to find the entropy of the image with python
[Python] Find the second smallest value.
Get the weather with Python requests
Get the weather with Python requests 2
Hit the Etherpad-lite API with Python
Install the Python plugin with Netbeans 8.0.2
I liked the tweet with python. ..
I calculated "Levenshtein distance" using Python
Master the type with Python [Python 3.9 compatible]
Find image similarity with Python + OpenCV
Find out the mystery change of Pokédex description by Levenshtein distance
Make the Python console covered with UNKO
Find the maximum value python (fixed ver)
Behind the flyer: Using Docker with Python
Check the existence of the file with python
Find the SHA256 value with R (with bonus)
[Python] Get the variable name with str
Search the maze with the python A * algorithm
Let's read the RINEX file with Python ①
Working with OpenStack using the Python SDK
Download files on the web with Python
Learn the design pattern "Singleton" with Python
[Python] Automatically operate the browser with Selenium
Learn the design pattern "Facade" with Python
The road to compiling to Python 3 with Thrift
FizzBuzz with Python3
Scraping with Python
Statistics with python
Scraping with Python
Twilio with Python
Integrate with Python
Play with 2016-Python
AES256 with python
Tested with Python
python starts with ()
with syntax (Python)
Find the general terms of the Tribonacci sequence with linear algebra and Python
Bingo with python
Zundokokiyoshi with python
Excel with Python
Microcomputer with Python
Cast with python
I tried "smoothing" the image with Python + OpenCV
[Python] Get the files in a folder with Python
Load the network modeled with Rhinoceros in Python ③
Prepare the execution environment of Python3 with Docker
Find the second derivative with JAX automatic differentiation
2016 The University of Tokyo Mathematics Solved with Python
I tried "differentiating" the image with Python + OpenCV
[Note] Export the html of the site with python.
Find a position above the threshold with NumPy
The easiest way to synthesize speech with python
Try to solve the man-machine chart with Python