This article is a continuation of the recently posted article (https://qiita.com/rsm223_rip/items/141eb146ad610215e5f7). This time I will write about bulk_write of pymongo.
--Instead of creating a query for each write to db and writing it, ** generate a large number of queries and write all at once with the bulk_write function **, and write ** db It is a very convenient operation ** to improve the throughput by doing it all together and reducing the round trip of the network. 。
See also: pymongo 3.9.0 document: Bulk Write Operations
You can use the same operations as usual with the following operations.
Basically, all you have to do is create an object for each operation, put it in a list and pass it to the bulk_write function.
main.py
from pprint import pprint
from pymongo import MongoClient
from pymongo import UpdateOne,InsertOne
from pymongo.errors import BulkWriteError
client = MongoClient()
db = client["Collection"]["table"]
#Delete all documents in db
# db.delete_many({})
# _Increment id and x to insert 1000 documents
opList = [ InsertOne({"_id":i,"x":i}) for i in range(0,1000)]
#When writing is successful, from the return value,
#Details can be obtained from the thrown error when an exception occurs
try:
result = db.bulk_write(opList)
print("At the end of normal")
pprint(result.bulk_api_result)
except BulkWriteError as bwe:
print("When an exception occurs")
pprint(bwe.details)
'''
{'writeErrors': [],
'writeConcernErrors': [],
'nInserted': 1000,
'nUpserted': 0,
'nMatched': 0,
'nModified': 0,
'nRemoved': 0,
'upserted': []}
'''
** Just specify ordered = False in the bulk_write option ** (Even if you are doing an illegal operation, all operations are tried, and you can see the details from the return value or exception.)
--First, after executing the above script, try inserting while incrementing the documents from _id 500 to 1500. (Since they are inserted in order, no one can be inserted and an error should occur.)
main.py
# _Increment id and x to insert 1000 documents
opList = [ InsertOne({"_id":i,"x":i}) for i in range(500,1500)]
#From the return value when writing is successful, from the error thrown when an exception occurs
#Details can be obtained
try:
result = db.bulk_write(opList)
print("At the end of normal")
pprint(result.bulk_api_result)
except BulkWriteError as bwe:
print("When an exception occurs")
pprint(bwe.details)
#Output result
# (You can see that the insertion of the first document failed and the subsequent writing was not possible)
'''
When an exception occurs
{'nInserted': 0,
'nMatched': 0,
'nModified': 0,
'nRemoved': 0,
'nUpserted': 0,
'upserted': [],
'writeConcernErrors': [],
'writeErrors': [{'code': 11000,
'errmsg': 'E11000 duplicate key error collection: '
'Collection.table index: _id_ dup key: { _id: 500 '
'}',
'index': 0,
'keyPattern': {'_id': 1},
'keyValue': {'_id': 500},
'op': {'_id': 500, 'x': 500}}]}
'''
--Next, set ordered = False in the bulk_write option and perform batch writing.
main.py
# _Increment id and x to insert 1000 documents
opList = [ InsertOne({"_id":i,"x":i}) for i in range(500,1500)]
#From the return value when writing is successful, from the error thrown when an exception occurs
#Details can be obtained
try:
result = db.bulk_write(opList,ordered=False)
print("At the end of normal")
pprint(result.bulk_api_result)
except BulkWriteError as bwe:
print("When an exception occurs")
pprint(bwe.details)
#An exception is thrown, but 500 documents can be inserted,
#You can confirm that you can get the reason for each failed write
'''
When an exception occurs
{'nInserted': 500,
'nMatched': 0,
'nModified': 0,
'nRemoved': 0,
'nUpserted': 0,
'upserted': [],
'writeConcernErrors': [],
'writeErrors': [{'code': 11000,
'errmsg': 'E11000 duplicate key error collection: '
'Collection.table index: _id_ dup key: { _id: 500 '
'}',
'index': 0,
'keyPattern': {'_id': 1},
'keyValue': {'_id': 500},
'op': {'_id': 500, 'x': 500}},
{'code': 11000,
'errmsg': 'E11000 duplicate key error collection: '
'Collection.table index: _id_ dup key: { _id: 501 '
'}',
'index': 1,
'keyPattern': {'_id': 1},
'keyValue': {'_id': 501},
'op': {'_id': 501, 'x': 501}},
{'code': 11000,
'errmsg': 'E11000 duplicate key error collection: '
'Collection.table index: _id_ dup key: { _id: 502 '
'}',
~~~~The following is omitted~~~~
'''
The bulk_write of pymongo looks like this. If there is a request, I will add various things.