I tried hundreds of millions of SQLite with python

mongodb has also reached its limit.

Mongodb shortest introduction (1) Install & start on EC2 & suddenly put tens of thousands [Mongodb Shortest Introduction (2) I searched for tens of thousands] (http://qiita.com/tottoko/items/68e61daf6eafb03d4b4b) [Mongodb Shortest Introduction (3) I tried to speed up even millions] (http://qiita.com/tottoko/items/b6c9430b1e0ab7cb409c)

Mr. mongo who is strict from tens of millions of data. Since the wall of 100 million has come into view, SQL Lite has appeared in a surprising place.

[Impact NoSQL KVS Easy DB mongo redis sqlite What was God speed locally? .. .. ] (http://qiita.com/wan-liner/items/ee07442e5bd4afd5b1b5)

If no processing is required ↑ This was the strongest.

Strictly speaking, it is better to keep the file as gz in a CSV reader. It is physically faster to decompress the compression in memory than to read the file from a regular disk. Of course, except in cases where reading files is faster than memory (SSD? Nvme?). So, CSV has columns, so sometimes I want to use placeholders.

To make the SQL easier to read, I would like you to use a placeholder with the column name: column_name1 to escape or automatically sanitize it.

Use the column name as a key in the CSV row list to make a dictionary in one shot:

data=dict(zip(keys,row))

keys is a list of column names Example: ('id','title') row is a CSV row separated by a delimiter Example: 1111 "\ t" aaaa Now data is a dictionary like {id: 1111, title: aaaa}. It's magic.

cursor.execute('insert into table (:id,:title)',data)

You can use placeholders like mysql. When it's not a dictionary?

cursor.execute('insert into table (?,?)',row)

Also, magic conn.text_factory = str

You didn't tell me that, so you got angry! sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings

# coding: utf-8
import sqlite3
import csv
import gzip

conn = sqlite3.connect("honyahonya.db")
conn.text_factory = str
csr = conn.cursor()

keys=(
'id',
'title',
'date',
'article'
)

with gzip.open("honyahonya.csv.gz", 'rU') as f:
    reader = csv.reader(f, delimiter="\t")
    for row in reader:
        #If you are confident in the data, it is faster to omit checking the number of columns
        if len(row)==len(key):
            d=dict(zip(keys,row))
            csr.execute('insert into tracks (id,title,date,article) values(:id,:title,:date,:article)' , d)
            conn.commit()


Recommended Posts

I tried hundreds of millions of SQLite with python
I tried fp-growth with python
I tried scraping with Python
I tried gRPC with Python
I tried scraping with python
[OpenCV / Python] I tried image analysis of cells with OpenCV
I tried "morphology conversion" of images with Python + OpenCV
I tried web scraping with python.
I tried running prolog with python 3.8.2.
I tried SMTP communication with Python
I tried to find the entropy of the image with python
I tried "gamma correction" of the image with Python + OpenCV
I tried running Movidius NCS with python of Raspberry Pi3
I tried a stochastic simulation of a bingo game with Python
I tried scraping Yahoo News with Python
I tried sending an email with python.
I tried non-photorealistic rendering with Python + opencv
I tried a functional language with Python
I tried recursion with Python ② (Fibonacci sequence)
#I tried something like Vlookup with Python # 2
I tried scraping the ranking of Qiita Advent Calendar with Python
I tried to create a list of prime numbers with python
I tried to fix "I tried stochastic simulation of bingo game with Python"
I tried to improve the efficiency of daily work with Python
I tried to automatically collect images of Kanna Hashimoto with Python! !!
I tried handwriting recognition of runes with scikit-learn
[Python / DynamoDB / boto3] List of operations I tried
I tried "smoothing" the image with Python + OpenCV
I tried image recognition of CIFAR-10 with Keras-Learning-
I tried "differentiating" the image with Python + OpenCV
I tried image recognition of CIFAR-10 with Keras-Image recognition-
I tried Python> autopep8
I tried Flask with Remote-Containers of VS Code
I tried L-Chika with Raspberry Pi 4 (Python edition)
I tried Jacobian and partial differential with python
I tried to get CloudWatch data with Python
I tried using mecab with python2.7, ruby2.3, php7
I tried function synthesis and curry with python
I tried to output LLVM IR with Python
I tried to get the authentication code of Qiita API with Python.
I tried "binarizing" the image with Python + OpenCV
I tried running faiss with python, Go, Rust
I tried to automate sushi making with python
I tried playing mahjong with Python (single mahjong edition)
I tried to streamline the standard role of new employees with Python
I tried running Deep Floor Plan with Python 3.6.10.
I tried sending an email with SendGrid + Python
I tried to get the movie information of TMDb API with Python
I tried Python> decorator
I tried using a database (sqlite3) with kivy
I tried to make a simple mail sending application with tkinter of Python
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
"Brute force of MD5 hash value of 6-digit password" I tried it with Python
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to extract features with SIFT of OpenCV
I tried to summarize how to use matplotlib of python
I tried to implement Minesweeper on terminal with python
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to draw a route map with Python
I tried to solve the soma cube with python