Let's improve the performance of mass inserts

For mass inserts into PostgresQL Let's use the utility for inserting multiple records together. Alternatively, review the commit execution frequency to reduce unnecessary processing costs.

Actually measure with multiple patterns and compare the performance. In psycopg2, one of the libraries that access PostgreSQL with Python As a utility for inserting multiple items at once Use a function called execute_values. Also, change the commit execution unit and change the commit execution interval for actual measurement.

Preparation for confirmation

Test table

`Test DDL`


CREATE TABLE test01
(key serial PRIMARY KEY,
 text_val text);

Test program

For simplicity, we will allow you to enter the number of executions and commits interactively. Include the part that creates the parameters so that they can be compared in the same range of time as much as possible. Initialize the values of TRUNCATE and SEQUENCE objects to reset the table each time before execution.

execute_values defaults to 100 lines. If you enter more than that, you must pass more than the number of inputs to the argument page_size.

`Test program`


import psycopg2
from psycopg2.extras import execute_values
from time import time

_text = 'test' * 300

conn = psycopg2.connect('dbname=dbname host=localhost user=username password=password')
cur = conn.cursor()

#Table reset
reset_query_tbl = "TRUNCATE TABLE test01;"
cur.execute(reset_query_tbl)
reset_query_seq = "SELECT setval ('test01_key_seq', 1, false);"
cur.execute(reset_query_seq)
count_query = "SELECT count(key) FROM test01;"
cur.execute(count_query)
(count,) = cur.fetchone()
conn.commit()
print('■ State before execution')
print('Number of data: ', count)
print()

#Test settings
print('■ INSERT method')
print('Please select the INSERT method from the following(0/1/2/3)')
print('　0:INSERT one by one&commit.')
print('　1:INSERT one by one&Multiple commit at once.')
print('　2:INSERT one by one&Commit all cases at once.')
print('　3:Collectively INSERT&commit.')
test_type = int(input('INSERT method>'))
test_cnt  = int(input('Number of INSERTs>'))
if test_type == 1:
    commit_cnt  = int(input('COMMIT interval>'))

#Test execution & result
print()
print('■ Execution result')
if test_type==0:

    query = "INSERT INTO test01 (text_val) VALUES ('%s');" % _text
    st = time()
    for _ in range(test_cnt):
        cur.execute(query)
        conn.commit()

    elapsed_time = time() - st
    print('elapsed time(s): ', elapsed_time)

if test_type==1:
    query = "INSERT INTO test01 (text_val) VALUES ('%s');" % _text
    st = time()
    for i in range(test_cnt):
        cur.execute(query)
        if (i + 1) % commit_cnt == 0:
            conn.commit()

    conn.commit()

    elapsed_time = time() - st
    print('elapsed time(s): ', elapsed_time)

if test_type==2:

    query = "INSERT INTO test01 (text_val) VALUES ('%s');" % _text
    st = time()
    for _ in range(test_cnt):
        cur.execute(query)

    conn.commit()

    elapsed_time = time() - st
    print('elapsed time(s): ', elapsed_time)

if test_type==3:

    query = "INSERT INTO test01 (text_val) VALUES %s;"
    st = time()

    params = []
    for _ in range(test_cnt):
        params += [(_text,)]

    ret = execute_values(cur, query, params, page_size=test_cnt)
    conn.commit()
    elapsed_time = time() - st
    print('elapsed time(s): ', elapsed_time)

cur.execute(count_query)
(count,) = cur.fetchone()
print('Number of data: ', count)

cur.close()
conn.close()

Performance results

The results are for reference only. Of course, it depends on the environment and machine specifications.

--Windows10 64bit / number of logical processors 4 / memory 8GB

PostgreSQL11 --PostgreSQL config is the default value.

INSERT method	Number of INSERTs	COMMIT count	processing time(S)
INSERT one by one(execute) &COMMIT one by one.	100,000	100,000	40.9
INSERT one by one(execute) &10 COMMITs each.	100,000	10,000	25.4
INSERT one by one(execute) &100 COMMITs at a time.	100,000	1,000	24.1
INSERT one by one(execute) & 1,000 COMMITs each.	100,000	100	27.2
INSERT one by one(execute) & 10,000 each COMMIT.	100,000	10	25.9
INSERT one by one(execute) &COMMIT all together.	100,000	1	24.4
Collectively INSERT(execute_values) &COMMIT all together.	100,000	1	8.8

Conclusion

execute_values, fast. Use it if you can.

[PYTHON] PostgreSQL-For those who want to INSERT at high speed

Let's improve the performance of mass inserts

Preparation for confirmation

Test table

Test DDL

Test program

Test program

Performance results

Conclusion

`Test DDL`

`Test program`