For mass inserts into PostgresQL Let's use the utility for inserting multiple records together. Alternatively, review the commit execution frequency to reduce unnecessary processing costs.
Actually measure with multiple patterns and compare the performance. In psycopg2, one of the libraries that access PostgreSQL with Python As a utility for inserting multiple items at once Use a function called execute_values. Also, change the commit execution unit and change the commit execution interval for actual measurement.
Test DDL
CREATE TABLE test01
(key serial PRIMARY KEY,
text_val text);
For simplicity, we will allow you to enter the number of executions and commits interactively. Include the part that creates the parameters so that they can be compared in the same range of time as much as possible. Initialize the values of TRUNCATE and SEQUENCE objects to reset the table each time before execution.
execute_values defaults to 100 lines. If you enter more than that, you must pass more than the number of inputs to the argument page_size.
Test program
import psycopg2
from psycopg2.extras import execute_values
from time import time
_text = 'test' * 300
conn = psycopg2.connect('dbname=dbname host=localhost user=username password=password')
cur = conn.cursor()
#Table reset
reset_query_tbl = "TRUNCATE TABLE test01;"
cur.execute(reset_query_tbl)
reset_query_seq = "SELECT setval ('test01_key_seq', 1, false);"
cur.execute(reset_query_seq)
count_query = "SELECT count(key) FROM test01;"
cur.execute(count_query)
(count,) = cur.fetchone()
conn.commit()
print('■ State before execution')
print('Number of data: ', count)
print()
#Test settings
print('■ INSERT method')
print('Please select the INSERT method from the following(0/1/2/3)')
print(' 0:INSERT one by one&commit.')
print(' 1:INSERT one by one&Multiple commit at once.')
print(' 2:INSERT one by one&Commit all cases at once.')
print(' 3:Collectively INSERT&commit.')
test_type = int(input('INSERT method>'))
test_cnt = int(input('Number of INSERTs>'))
if test_type == 1:
commit_cnt = int(input('COMMIT interval>'))
#Test execution & result
print()
print('■ Execution result')
if test_type==0:
query = "INSERT INTO test01 (text_val) VALUES ('%s');" % _text
st = time()
for _ in range(test_cnt):
cur.execute(query)
conn.commit()
elapsed_time = time() - st
print('elapsed time(s): ', elapsed_time)
if test_type==1:
query = "INSERT INTO test01 (text_val) VALUES ('%s');" % _text
st = time()
for i in range(test_cnt):
cur.execute(query)
if (i + 1) % commit_cnt == 0:
conn.commit()
conn.commit()
elapsed_time = time() - st
print('elapsed time(s): ', elapsed_time)
if test_type==2:
query = "INSERT INTO test01 (text_val) VALUES ('%s');" % _text
st = time()
for _ in range(test_cnt):
cur.execute(query)
conn.commit()
elapsed_time = time() - st
print('elapsed time(s): ', elapsed_time)
if test_type==3:
query = "INSERT INTO test01 (text_val) VALUES %s;"
st = time()
params = []
for _ in range(test_cnt):
params += [(_text,)]
ret = execute_values(cur, query, params, page_size=test_cnt)
conn.commit()
elapsed_time = time() - st
print('elapsed time(s): ', elapsed_time)
cur.execute(count_query)
(count,) = cur.fetchone()
print('Number of data: ', count)
cur.close()
conn.close()
The results are for reference only. Of course, it depends on the environment and machine specifications.
--Windows10 64bit / number of logical processors 4 / memory 8GB
INSERT method | Number of INSERTs | COMMIT count | processing time(S) |
---|---|---|---|
INSERT one by one(execute) &COMMIT one by one. | 100,000 | 100,000 | 40.9 |
INSERT one by one(execute) &10 COMMITs each. | 100,000 | 10,000 | 25.4 |
INSERT one by one(execute) &100 COMMITs at a time. | 100,000 | 1,000 | 24.1 |
INSERT one by one(execute) & 1,000 COMMITs each. | 100,000 | 100 | 27.2 |
INSERT one by one(execute) & 10,000 each COMMIT. | 100,000 | 10 | 25.9 |
INSERT one by one(execute) &COMMIT all together. | 100,000 | 1 | 24.4 |
Collectively INSERT(execute_values) &COMMIT all together. | 100,000 | 1 | 8.8 |
execute_values, fast. Use it if you can.
Recommended Posts