[PYTHON] [Verification] Does levelDB take time to register data when the amount of data increases? ??

What I was interested in

I found an article that when the amount of data increases when using ethereum as a private blockchain, it takes time to register data in levelDB used in geth (go-ethereum). There is such a thing. I actually measured it

What i did

Key registers 1 million items that are too appropriate, such as appropriate data for the process number and Value. The processing time required for each 100 cases was measured, and it was confirmed whether the processing time was increasing.

I tried not only levelDB but also mongoDB

levelDB processing time

Register the data with the following code

import time
import leveldb

# Test data: 750 bytes
 inputData ='{"_ id": "ObjectId (\ 5e3b4dd825755df3f15a2d17 \") "," coediting ": False," comments_count ": 0," created_at ":" 2020-02-05T20: 20: 10 + 09: 00 ", "group": None, "id": "6ed1eec1d6fba127a863", "likes_count": 0, "private": False, "reactions_count": 0, "tags": [{"name": "Python", "versions": []}, {"name": "MongoDB", "versions": []}, {"name": "Python3", "versions": []}, {"name": "pymongo", "versions" : []}], "title": "Operating mongoDB with Python-Part 6: aggregate-", "updated_at": "2020-02-05T20: 20: 10 + 09: 00", "url": "https://qiita.com/bc_yuuuuuki/items/6ed1eec1d6fba127a863", "page_views_count": 96, "tag1": "Python", "tag2": "MongoDB", "tag3": "Python3", "tag4": "pymongo", "tag5": "", "tag_list": ["Python", "MongoDB", "Python3", "pymongo"], "stocks_count": 0}'

start = time.time()

db = leveldb.LevelDB("measurement_test")

start = time.time()

for i in range(1,1000001):
    db.Put(i.to_bytes(4, 'little'), inputData.encode('utf-8'))
    if i % 100 == 0:
        end = time.time()
        print("{0}:{1}".format(i,end-start))
        start = end

Processing result levelDB.png

mongoDB processing time

Measured with the following code

from mongo_sample import MongoSample
import time

mongo = MongoSample("db","measurement")

# Test data: 750 bytes
 inputData ='{"_ id": "ObjectId (\ 5e3b4dd825755df3f15a2d17 \") "," coediting ": False," comments_count ": 0," created_at ":" 2020-02-05T20: 20: 10 + 09: 00 ", "group": None, "id": "6ed1eec1d6fba127a863", "likes_count": 0, "private": False, "reactions_count": 0, "tags": [{"name": "Python", "versions": []}, {"name": "MongoDB", "versions": []}, {"name": "Python3", "versions": []}, {"name": "pymongo", "versions" : []}], "title": "Operating mongoDB with Python-Part 6: aggregate-", "updated_at": "2020-02-05T20: 20: 10 + 09: 00", "url": "https://qiita.com/bc_yuuuuuki/items/6ed1eec1d6fba127a863", "page_views_count": 96, "tag1": "Python", "tag2": "MongoDB", "tag3": "Python3", "tag4": "pymongo", "tag5": "", "tag_list": ["Python", "MongoDB", "Python3", "pymongo"], "stocks_count": 0}'

start = time.time()
path = "measurement.txt"
with open(path, "w") as out:
    for i in range(1,1000001):
        mongo.insert_one({str(i):inputData})
        if i % 100 == 0:
            end = time.time()
            out.write("{0}:{1}\n".format(i,end-start))
            start = time.time()

Processing result mongoDB.png

inspection result

Looking at the distribution of processing results, both levelDB and mongoDB are generally distributed in a certain processing time zone. There are some that require outstanding processing time, but since there are only a few cases, it cannot be said that the writing speed has deteriorated.

If the writing speed slows down depending on the amount of data, the distribution will rise to the right or draw a curve toward the upper right, but this verification did not result in such a distribution.

Impressions

Since the detailed environment and amount of data of the article read was unknown, it remains unclear whether the content of the article referred to is true. Also, geth is of course a go language, and I found it awkward to create a go environment, so I tried it with python. This area may also have an impact, so I think I'll continue to investigate.

I'm glad that I got a little knowledge about levelDB.

Recommended Posts

[Verification] Does levelDB take time to register data when the amount of data increases? ??
Example of what to do when the sample script does not work (OpenCV-Python)
I just wanted to extract the data of the desired date and time with Django
It's time to seriously think about the definition and skill set of data scientists
[Introduction to SIR model] Predict the end time of each country with COVID-19 data fitting ♬
[Introduction to logarithmic graph] Predict the end time of each country from the logarithmic graph of infection number data ♬
How to calculate the sum or average of time series csv data in an instant
Grep so that grep does not appear at the time of grep
How to calculate the amount of calculation learned from ABC134-D
[Introduction to matplotlib] Read the end time from COVID-19 data ♬
When you want to register Django's initial data with relationships
The story of copying data from S3 to Google's TeamDrive
Check when the Docker container does not connect to the Internet
The minimum methods to remember when aggregating data in Pandas
I sent the data of Raspberry Pi to GCP (free)
Try to extract the features of the sensor data with CNN
Manage the overlap when drawing scatter plots with a large amount of data (Matplotlib, Pandas, Datashader)
Learn accounting data and try to predict accounts from the content of the description when entering journals