[PYTHON] Example of batch commit creation (some methods to process while ticking the array) and sample of batch writing to Firestore

When storing a large amount of data in the DB, a certain amount of write processing may be accumulated in the batch object and committed periodically.

A python code note on how to process an array at regular intervals to do the above. If you know a better way, I would appreciate it if you could point it out.

Finally, the sample code for batch writing to the Firestore of GCP is posted.

Update: Added "Method 4 (using slices)" after pointing out (Thanks to @shiracamus)

Method 1 (use a dedicated counter)

data = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

q = []
batch_size = 3
batch_count = 0
for d in data:
    print("{}".format(d))
    q.append(d)
    batch_count += 1
    if batch_count == batch_size:
        print("commit {}".format(q))
        q = []
        batch_count = 0

print("commit {}".format(q))

> python sample1.py
a
b
c
commit ['a', 'b', 'c']
d
e
f
commit ['d', 'e', 'f']
g
h
commit ['g', 'h']

Method 2 (using index and remainder)

data = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

q = []
batch_size = 3
for i, d in enumerate(data):
    print(d)
    q.append(d)
    if (i + 1) % batch_size == 0:
        print("commit {}".format(q))
        q = []

print("commit {}".format(q))

> python sample2.py
a
b
c
commit ['a', 'b', 'c']
d
e
f
commit ['d', 'e', 'f']
g
h
commit ['g', 'h']

Method 3 (devise the last commit)

It's a good way to get rid of the last commit statement and get a better view of the code.

Since it is necessary to acquire the data size in advance, it cannot be used for iterable data, and the weakness is that the judgment processing is inefficient.

data = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
  
q = []
last = len(data)
batch_size = 3
for i, d in enumerate(data):
    print(d)
    q.append(d)
    if ((i + 1) % batch_size == 0) | ((i + 1) == last):
        print("commit {}".format(q))
        q = []

> python sample3.py
a
b
c
commit ['a', 'b', 'c']
d
e
f
commit ['d', 'e', 'f']
g
h
commit ['g', 'h']

Method 3 + Alpha

Python's enumerate allows you to specify the start number with the second argument, so Method 3 can be a little simpler.

data = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
  
q = []
last = len(data)
batch_size = 3
for n, d in enumerate(data, 1):
    print(d)
    q.append(d)
    if (n % batch_size == 0) | (n == last):
        print("commit {}".format(q))
        q = []

> python sample3.py
a
b
c
commit ['a', 'b', 'c']
d
e
f
commit ['d', 'e', 'f']
g
h
commit ['g', 'h']

Method 4 (using slices)

Fashionable method (Thanks to @shiracamus) If it cannot be passed as an array to q of the batch object, separate it with for in etc.

data = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

batch_size = 3
for i in range(0, len(data), batch_size):
    q = data[i:i+batch_size]
    print("commit", q)

> python sample3.py
commit ['a', 'b', 'c']
commit ['d', 'e', 'f']
commit ['g', 'h']

reference

In method 4, it is a sample to add data of 500 batch inserts to the firestore of GCP.

    db = firestore.Client()
    collection = db.collection("<COLLECTION NAME>")

    batch_size = 500
    batch = db.batch()
    for i in range(0, len(data), batch_size):
        for row in data[i:i + batch_size]:
            batch.set(collection.document(), row)
        print('committing...')
        batch.commit()
        batch = db.batch()

Recommended Posts

Example of batch commit creation (some methods to process while ticking the array) and sample of batch writing to Firestore
[Python] Three methods to compare the list of one-dimensional array and the list of two-dimensional array and extract only the matching values [json]
Wikipedia goes from the era of writing to the era of creation ~ Automatic generation from Twitter