Try using LevelDB in Python (plyvel)

What is LevelDB?

It is a key-value store of the type saved in a file, and is a library made by Google that can read and write the string of key => value at high speed. Here in Python, I will introduce a tutorial to try using LevelDB with python using a library called plyvel. (Although there is a Python LevelDB library other than plyvel, it seems that plyvel looks good after using it.)

Berkeley DB is famous for this kind of thing in the old days, and Kyoto Cabinet created by Mr. Hirabayashi of Japan for the modern one. There are memcached and Redis in the server type, but LevelDB is not a server type but a library that operates local files. If you want to share and save key-value data from many server processes, memcached, Kyoto Tycoon, Redis, etc. will be more suitable.

Python has dict as a data structure that manages by associating values with keys, but it is a convenient library when you want to handle a large amount of data that does not survive in memory.

Prior knowledge

--LevelDB is a key-value store that manages a key, which is a byte string, by associating it with a value, which is a byte string. You can look up value from key at high speed. --The LevelDB database is a directory. --The LevelDB database can only be opened by at most one process at a time. (An error will occur if another process is already open) --Since it is not a server, it does not operate key-value in communication. (A library that manipulates local files.) --plyvel is implemented in a C extension that uses the native leveldb library (hence fast but requires library installation and compilation) --Binary values will be given to LevelDB keys and values. --If the value you want to save is not string, you need to serialize it in some way, such as pickle or msgpack. --The value of python's ʻunicode`` type needs to be encoded in binary with `` val.encode ('utf-8') etc.

Installation (1/2)

First, you need the native library of leveldb, so install it. If you are using Homebrew on Mac, it is easier to use brew.

$ brew install leveldb

There seems to be a Makefile on other platforms, so I think you can install it with make and sudo make install.

Installation (2/2)

Let's install plyvel. I'm not sure why it's spelled like this, but it's plyvel anyway.

$ pip install plyvel

If Python.h is not in the include path of the system, the extension library cannot be compiled and an error may occur. For Linux, it's a good idea to check if a package such as python-devel is included.

Try opening and closing the DB

Note that the LevelDB database is a directory.

import plyvel
my_db = plyvel.DB('/tmp/test.ldb', create_if_missing=True)  #If not, make
my_db.close()

In case of create_if_missing = False, an exception will be thrown if the DB file does not exist.

Register key => value

LevelDB can only associate a byte string with a byte string, so the basics are as follows.

my_db.put('key1', 'value1')
my_db.put(u'Hoge'.encode('utf-8'), u'Hogeバリュー'.encode('utf-8'))  #Byte string when using unicode

If you want to link a complex data structure to a key and save it, try using the pickle or MessagePack below.

Extract value from key

Just do get.

value1 = my_db.get('key1')
value2 = my_db.get(u'Unicode key is encoded in byte string'.encode('utf-8'))

Delete key

Delete with the delete method.

my_db.delete('key1')
my_db.delete(u'Unicode key is encoded in byte string'.encode('utf-8'))

Iterate saved items

There are times when you want to register a lot of keys and values and finally put them all out in CSV format. Of course, LevelDB can also retrieve the stored keys and values with an iterator.

my_db.put('key1', '1')
my_db.put('key2', '1')
my_db.put('key3', '3')

for key, value in my_db:
	print '%s => %s' % (key, value)

#output:
# key1 => 1
# key2 => 2
# key3 => 3

It seems that you can also specify the key range or specify the key prefix to retrieve it. See the iterators section of the plyvel documentation (https://plyvel.readthedocs.org/en/latest/user.html#iterators) for more information.

Try to register a structured value: pickle

As I wrote at the beginning, value must be a byte string, so if you want to associate structured data such as dict and list / with key and save it in LevelDB, you need to serialize it. Python has a standard library for serialization called pickle, so it's easy to use. Pickle is very powerful because it can serialize not only basic data types but also functions and objects. However, it is difficult to deserialize pickle data (return to the original data structure) in languages other than Python, so if you want to divert the data to other languages, you should serialize it in the Message Pack format described later.

fukuzatsu1 = dict(a=10, b=20, c=[123, 234, 456])
my_db.put('key1', fukuzatsu1)  #Get an error

import pickle
serialized1 = pickle.dumps(fukuzatsu1)
my_db.put('key1', serialized1)  # OK 

#When using value
serialized1 = my_db.get('key1')
fukuzatsu1 = pickle.loads(serialized1)
print fukuzatsu1['a']  # => 10

For details on how to use pickle, refer to pickle documentation.

Try registering a structured value: msgpack

In addition to pickle, we recommend MessagePack, a serializing format from Japan. MessagePack is characterized by its compact and high-speed serialization of data. When using MessagePack with Python, install a package called msgpack-python.


$ pip install msgpack-python

msgpack uses packb / `ʻunpackbinstead of pickle's dumps/ loads``.

fukuzatsu1 = dict(a=10, b=20, c=[123, 234, 456])

import msgpack
serialized1 = msgpack.packb(fukuzatsu1, encoding='utf-8')
my_db.put('key1', serialized1)

#When using value
serialized1 = my_db.get('key1')
fukuzatsu1 = msgpack.unpackb(serialized1, encoding='utf-8')
print fukuzatsu1['a']  # => 10

See the msgpack-python API documentation for more information.

Reference link

-Google open source NoSQL lightweight library "LevelDB". Benchmark comparison with SQLite is also released

Recommended Posts

Try using LevelDB in Python (plyvel)
Try using Leap Motion in Python
Try using the Wunderlist API in Python
Try using the Kraken API in Python
Try gRPC in Python
Try 9 slices in Python
Try using Tweepy [Python2.7]
Try using the BitFlyer Ligntning API in Python
Try using ChatWork API and Qiita API in Python
Try using the DropBox Core API in Python
[Python] Try using Tkinter's canvas
Try using Kubernetes Client -Python-
Try LINE Notify in Python
Try implementing Yubaba in Python 3
Translate using googletrans in Python
Using Python mode in Processing
[Unity (C #), Python] Try running Python code in Unity using IronPython
Try to make it using GUI and PyQt in Python
Try using Spyder included in Anaconda
GUI programming in Python using Appjar
Precautions when using pit in Python
Try implementing extension method in python
Try using Pleasant's API (python / FastAPI)
Let's try Fizz Buzz in Python
Try to calculate Trace in Python
Try PLC register access in Python
Using global variables in python functions
Try building a neural network in Python without using a library
Let's see using input in python
Infinite product in Python (using functools)
Edit videos in Python using MoviePy
Try using Python argparse's action API
Try using the Python Cmd module
Try running a function written in Python using Fn Project
Handwriting recognition using KNN in Python
Depth-first search using stack in Python
When using regular expressions in Python
Try using Amazon DynamoDB from Python
GUI creation in python using tkinter 2
Build and try an OpenCV & Python environment in minutes using Docker
Try to log in to Netflix automatically using python on your PC
Try using FireBase Cloud Firestore in Python for the time being
Mouse operation using Windows API in Python
Notes using cChardet and python3-chardet in Python 3.3.1.
Try logging in to qiita with Python
GUI creation in python using tkinter part 1
Get Suica balance in Python (using libpafe)
Try mathematical formulas using Σ with python
Slowly hash passwords using bcrypt in Python
Using venv in Windows + Docker environment [Python]
Try using the HL band in order
Try working with binary data in Python
Try drawing a simple animation in Python
Try using Dialogflow (formerly API.AI) Python SDK #dialogflow
Tweet using the Twitter API in Python
[Python] [Windows] Serial communication in Python using DLL
Try using Python with Google Cloud Functions
I tried using Bayesian Optimization in Python
Try using Junos On-box Python # 2 Commit Script
Log in to Slack using requests in Python
Get Youtube data in Python using Youtube Data API