I tried running faiss with python, Go, Rust

Introduction

How are you doing in the whirlpool of the corona? I used faiss with Go and Rust, which I recently started studying, in order to devote my spare time to studying and verification that I can't usually use because I don't have to go out unnecessarily. I decided to try it. Faiss is a favorite neighborhood search library provided by Facebook Resarch, which was also introduced in Introduction of a little niche function of faiss.

Implementation

Then, I would like to implement it in the order of python, Go, Rust.

python First is the environment construction. If you follow the original Installation Guide, the module will be installed without any problem. The method of installing with conda is also written, but I personally have a bitter memory in the conda environment, so I built it from the source. Next is the source code. The same is true for Go and Rust, but I wanted to measure performance later, so I output the log as json. Also, if you do not release the memory in each iteration, the memory will continue to increase for each iteration, so put the del variable andgc.collect ()at the end to forcibly release the memory. It is.

main.py


import gc
import logging
import sys
from time import time

import faiss
import numpy as np
from pythonjsonlogger import jsonlogger


def elapsed(f, *args, **kwargs):
    start = time()
    f(*args, **kwargs)
    elapsed_time = time() - start
    return elapsed_time


if __name__ == '__main__':
    # Prepare log.
    logger = logging.getLogger()
    formatter = jsonlogger.JsonFormatter('(levelname) (asctime) (pathname) (lineno) (message)')
    handler = logging.StreamHandler(sys.stdout)
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    logger.setLevel(logging.INFO)
    # Define basic information.
    d = 512
    N = 1e6
    nd = 10
    nbs = np.arange(N / nd, N + 1, N / nd).astype(int)
    nq = 10
    k = 5
    # Start measuring performance.
    for i in range(100):
        for nb in nbs:
            # Prepare data.
            xb = np.random.rand(nb, 512).astype(np.float32)
            xq = np.random.rand(nq, 512).astype(np.float32)
            # Construct index.
            index = faiss.IndexFlatL2(d)
            # Evaluate performance.
            elapsed_add = elapsed(index.add, xb)
            elapsed_search = elapsed(index.search, xq, k)
            # Log performance.
            logger.info('end one iteration.', extra={
                'i': i,
                'nb': nb,
                'elapsed_add': elapsed_add,
                'elapsed_search': elapsed_search
            })
            # Force to free memory.
            del xb
            del xq
            del index
            gc.collect()

Go Next is Go. This also starts with environment construction. For Go, this is it! I didn't have the faiss wrapper, so I decided to use zhyon404 / faiss, which seems to be the simplest wrapping. In this repository, the environment is provided by Docker, so follow the README and do docker build to create the environment. It was. Next is the source code. Go also uses logrus.JSONFormatter to output the log to json output, and also releases the memory in each iteration. In particular, the faiss index has memory allocated in the C area, so it was not released easily, and I had a hard time finding a method called faiss_go.FaissIndexFree.

main.go


package main

import (
	"github.com/facebookresearch/faiss/faiss_go"
	log "github.com/sirupsen/logrus"
	"math/rand"
	"os"
	"runtime"
	"runtime/debug"
	"time"
)

func main() {
	// Prepare log.
	log.SetFormatter(&log.JSONFormatter{})
	log.SetOutput(os.Stdout)
	// Define basic information.
	d := 512
	nbs := []int{1e5, 2e5, 3e5, 4e5, 5e5, 6e5, 7e5, 8e5, 9e5, 1e6}
	nq := 10
	k := 5
	// Start measuring performance.
	for i := 0; i < 100; i++ {
		for _, nb := range nbs {
			// Prepare data.
			xb := make([]float32, d*nb)
			xq := make([]float32, d*nq)
			for i := 0; i < nb; i++ {
				for j := 0; j < d; j++ {
					xb[d*i+j] = rand.Float32()
				}
			}
			for i := 0; i < nq; i++ {
				for j := 0; j < d; j++ {
					xq[d*i+j] = rand.Float32()
				}
			}
			// Construct index.
			v := new(faiss_go.Faissindexflatl2)
			faiss_go.FaissIndexflatl2NewWith(&v, d)
			index := (*faiss_go.Faissindex)(v)
			// Evaluate performance.
			add_start := time.Now()
			faiss_go.FaissIndexAdd(index, nb, xb)
			add_end := time.Now()
			I := make([]int, k*nq)
			D := make([]float32, k*nq)
			search_start := time.Now()
			faiss_go.FaissIndexSearch(index, nq, xq, k, D, I)
			search_end := time.Now()
			// Log performance.
			log.WithFields(log.Fields{
				"i": i,
				"nb": nb,
				"elapsed_add": add_end.Sub(add_start).Seconds(),
				"elapsed_search": search_end.Sub(search_start).Seconds(),
			}).Info("end one iteration.")
			// Force to free memory.
			faiss_go.FaissIndexFree(index)
			runtime.GC()
			debug.FreeOSMemory()
		}
	}
}

Rust Finally, Rust. Here too, it is from the environment construction. I decided to use Enet4 / faiss-rs, which is also published in Docs.rs, as the wrapper for faiss. Basically, you can install it by following README.

This will result in the dynamic library faiss_c ("libfaiss_c.so" in Linux), which needs to be installed in a place where your system will pick up. In Linux, try somewhere in the LD_LIBRARY_PATH environment variable, such as "/usr/lib", or try adding a new path to this variable.

It is important not to forget to add the path to the library. Also, depending on the environment, it does not seem to work unless it is added to LIBRARY_PATH. Next is the source code. This also uses json_logger to output the log as json. I have defined struct according to the sample, but I wonder if there is a better way. I think. Also, since Rust's random number generation was slow and performance measurement was difficult, use rand_xorshift by referring to Difference in processing speed when generating random numbers with Rust. I made it. What was interesting was that, unlike python and Go, it was possible to implement it without being particularly conscious of memory release, even though memory allocation in the C area was involved.

Cargo.toml


[package]
name    = "rust"
version = "0.1.0"
authors = []
edition = "2018"

[dependencies]
faiss           = "0.8.0"
json_logger     = "0.1"
log             = "0.4"
rand            = "0.7"
rand_xorshift   = "0.2"
rustc-serialize = "0.3"

main.rs


use faiss::{Index, index_factory, MetricType};
use log::{info, LevelFilter};
use rand::{RngCore, SeedableRng};
use rand_xorshift::XorShiftRng;
use rustc_serialize::json;
use std::time::Instant;

#[derive(RustcEncodable)]
struct LogMessage<'a> {
    msg: &'a str,
    i: i32,
    nb: i32,
    elapsed_add: f32,
    elapsed_search: f32
}

fn main() {
    // Prepare log.
    json_logger::init("faiss", LevelFilter::Info).unwrap();
    // Define basic information.
    let d: i32 = 512;
    const N: i32 = 1_000_000;
    let nd: i32 = 10;
    let nbs: Vec<i32> = (N/nd..N+1).step_by((N/nd) as usize).collect();
    let nq: i32 = 10;
    let k: usize = 5;
    let mut rng: XorShiftRng = SeedableRng::from_entropy();
    // Start measuring performance.
    for i in 0..100 {
        for &nb in nbs.iter() {
            // Prepare data.
            let xb: Vec<f32> = (0..nb*d).map(|_| rng.next_u32() as f32 / u32::max_value() as f32).collect();
            let xq: Vec<f32> = (0..nq*d).map(|_| rng.next_u32() as f32 / u32::max_value() as f32).collect();
            // Construct index.
            let mut index = index_factory(d as u32, "Flat", MetricType::L2).unwrap();
            // Evaluate performance.
            let start = Instant::now();
            index.add(&xb).unwrap();
            let elapsed_add = start.elapsed().as_micros() as f32 / 1e6;
            let start = Instant::now();
            index.search(&xq, k).unwrap();
            let elapsed_search = start.elapsed().as_micros() as f32 / 1e6;
            // Log performance.
            info!("{}", json::encode(&LogMessage {
                msg: "end one iteration.", i, nb, elapsed_add, elapsed_search
            }).unwrap());
        }
    }
}

Performance verification

Well, python, GO, Rust are just wrapping the C API of faiss, so I thought that there would be no difference in performance, but I implemented it so I decided to measure performance. .. In addition, OpenBLAS was used for the matrix operation library, m5.large of AWS EC2 was used for the environment, and AMI was measured with Canonical, Ubuntu, 18.04 LTS, amd64 bionic image build on 2020-01-12. The graph below shows the average processing time for each number of data, with the number of target data being moved between $ 10 ^ 5 $ and $ 10 ^ 6 $ for search and add 100 times each.

add add result

search Screenshot_2020-04-07 Analyze Performance(2).png

In all languages, the processing time increases linearly with the increase in the number of data. With add, there was almost no difference in performance between the three languages, but with search, the result was that only python performed better. I was a little surprised because I thought that there would be no difference in performance because they are similar wrappers. Furthermore, if we plot Go processing time / python processing time to see how this performance difference changes depending on the number of data.

performance ratio

It seems that Go and Rust are slower than python, with an average of 1.44 times regardless of the number of data.

Summary

I tried using faiss, which is a neighborhood search library of Facebook Research, with python, Go, and Rust, and measured the performance. When I tried using the same library in different languages, it was interesting to see something like a habit of each language. In particular, I felt that it was encouraging that Rust would release even the memory allocated in the C area properly. The language has evolved considerably since the days when it was implemented in C. In terms of performance, Go and Rust are about the same, and only python is 1.44 times faster. I was surprised to assume that the performance would be the same in that all three languages are faiss wrappers written in C. Only python is officially supported, and the compilation method is different between python and Go / Rust, so I'm wondering if that is involved. It will be interesting to dig deep there this time!

Recommended Posts

I tried running faiss with python, Go, Rust
I tried Grumpy (Go running Python).
I tried running prolog with python 3.8.2.
I tried running Deep Floor Plan with Python 3.6.10.
I tried fp-growth with python
I tried gRPC with Python
I tried scraping with python
I tried web scraping with python.
I tried SMTP communication with Python
I tried running Movidius NCS with python of Raspberry Pi3
Python with Go
I tried scraping Yahoo News with Python
I tried sending an email with python.
I tried non-photorealistic rendering with Python + opencv
I tried a functional language with Python
I tried recursion with Python ② (Fibonacci sequence)
#I tried something like Vlookup with Python # 2
I tried "smoothing" the image with Python + OpenCV
I tried hundreds of millions of SQLite with python
I tried running pymc
I tried "differentiating" the image with Python + OpenCV
I tried Python> autopep8
I tried Jacobian and partial differential with python
I tried to get CloudWatch data with Python
I tried using mecab with python2.7, ruby2.3, php7
I tried function synthesis and curry with python
I tried to output LLVM IR with Python
I tried "binarizing" the image with Python + OpenCV
I tried to automate sushi making with python
I tried playing mahjong with Python (single mahjong edition)
I tried running python -m summpy.server -h 127.0.0.1 -p 8080
I tried running alembic, a Python migration tool
I tried sending an email with SendGrid + Python
I tried Python> decorator
I tried running TensorFlow
I tried LeetCode every day 7. Reverse Integer (Python, Go)
I tried running BERT with Sakura VPS (without GPU)
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to draw a route map with Python
[OpenCV / Python] I tried image analysis of cells with OpenCV
I tried running python etc. from a bat file
I tried LeetCode every day 112. Path Sum (Python, Go)
I tried to get started with blender python script_Part 02
I tried to implement an artificial perceptron with python
I tried LeetCode every day 20. Valid Parentheses (Python, Go)
I tried LeetCode every day 136. Single Number (Python, Go)
I tried to automatically generate a password with Python3
Mayungo's Python Learning Episode 1: I tried printing with print
I tried LeetCode every day 118. Pascal's Triangle (Python, Go)
[Python] I tried running a local server using flask
I tried LeetCode every day 125. Valid Palindrome (Python, Go)
I tried LeetCode every day 155. Min Stack (Python, Go)
I tried to solve the problem with Python Vol.1
I tried to analyze J League data with Python
I tried LeetCode every day 9. Palindrome Number (Python, Go)
I tried LeetCode every day 1. Two Sum (Python, Go)
I tried "morphology conversion" of images with Python + OpenCV
I tried hitting the API with echonest's python client
I tried to solve AOJ's number theory with Python
I tried Learning-to-Rank with Elasticsearch!