Write a python program to find the editing distance [python] [Levenshtein distance]

Editing distance is what

--Indicator to compare two strings ――How many times can you convert from s1 to s2 by performing the following three operations? --Replace --Insert

Rough explanation

--Consider converting a string from s1 to s2

s1 = "aaa"
s2 = "aab"

--If the character string is as above, the edit distance is 1. --One replacement

s1 = "aba"
s2 = "cc"

--If the character string is as above, the edit distance is 3 --Replacement twice --One insertion

Implementation

There is a very useful python module python-Levenshtein The official documentation is here

Installation

Install with pip

$ pip install python-Levenshtein

program

leven.py


import Levenshtein
import sys

args = sys.argv

with open(args[1], "r") as f_ans:
    with open(args[2], "r") as f_ref:
        s_ans = f_ans.read()
        s_ref = f_ref.read()

print(Levenshtein.distance(s_ans, s_ref))

It's a simple program that just spits out the edit distance by specifying two text files as command line arguments.

result

Prepare two text files.

tmp1.txt


Helllo worb!!

tmp2.txt


Hello world!

To convert from tmp1.txt to tmp2.txt,

--Deleted one "l" from "Helllo" --About "worb" --Added "l" --Replace "b" with "d" -Delete one "!"

So the edit distance should be 4.

$ python leven.py tmp1.txt tmp2.txt
4

became. happy.

Impressions

This is an article written by a person who tried to implement it by dynamic programming, but found that the module was missing and it became troublesome to implement. If you have any questions, please leave them in the comments.

Recommended Posts

Write a python program to find the editing distance [python] [Levenshtein distance]
Find the Levenshtein Distance with python
A program to write Lattice Hinge with Rhinoceros with Python
How to write a Python class
Write a program to solve the 4x4x4 Rubik's Cube! 1. Overview
[Python] A program to find the number of apples and oranges that can be harvested
Let's write a program to solve the 4x4x4 Rubik's Cube! 2. Algorithm
Let's write a program to solve the 4x4x4 Rubik's Cube! 3. Implementation
Try to write a program that abuses the program and sends 100 emails
Write a script to calculate the distance with Elasticsearch 5 system painless
[Python] A simple function to find the center coordinates of a circle
[Python] A program that rotates the contents of the list to the left
Write the test in a python docstring
Write a Caesar cipher program in Python
Various comments to write in the program
[Python] A program that rounds the score
Let's write a program to solve the Rubik's Cube (Part 2: IDA * Search)
[Python] A program that calculates the number of socks to be paired
[Introduction to Python] How to write a character string with the format function
I made a program to check the size of a file in Python
The trick to write flatten concisely in python
Why does Python have to write a colon?
Let's write a Python program and run it
[Introduction to Algorithm] Find the shortest path [Python3]
A python amateur tries to summarize the list ②
[Python] Find the transposed matrix in a comprehension
I want to write to a file with Python
[Python] Throw a message to the slack channel
[Circuit x Python] How to find the transfer function of a circuit using Lcapy
Find the maximum Python
Think about how to program Python on the iPad
[Python] A program that counts the number of valleys
Write a super simple molecular dynamics program in python
I want to write in Python! (2) Let's write a test
How to write a list / dictionary type of Python3
From buying a computer to running a program with python
[Python] How to write a docstring that conforms to PEP8
Write data to KINTONE using the Python requests module
I want to write in Python! (3) Utilize the mock
[Python] A memo to write CSV vertically with Pandas
Specifies the function to execute when the python program ends
Write a log-scale histogram on the x-axis in python
Write code to Unit Test a Python web app
[Python] A program that compares the positions of kangaroos.
Python Note: The mystery of assigning a variable to a variable
How to find the first element that matches your criteria in a Python list
Python --Read data from a numeric data file to find the covariance matrix, eigenvalues, and eigenvectors
Find out the apparent width of a string in python
[python] Change the image file name to a serial number
Run the program without building a Python environment! !! (How to get started with Google Colaboratory)
A road to intermediate Python
Find the difference in Python
How to run a Python program from within a shell script
How to use the __call__ method in a Python class
Change the standard output destination to a file in Python
Probably the easiest way to create a pdf with Python3
[Python] Chapter 01-03 About Python (Write and execute a program using PyCharm)
A program that automatically resizes the iOS app icon to the required image size in Python
How to start the PC at a fixed time every morning and execute the python program
[Python] Programming to find the number of a in a character string that repeats a specified number of times.
Recursively get the Excel list in a specific folder with python and write it to Excel.