I want a python script that imports a csv file with a date in the file name into BigQuery, such as xxxx_20200930.csv
, with a partition time.
This time, I created it on the assumption that a large number of csv files are in the directory and below.
main.py
from google.cloud import bigquery
import json
import glob
client = bigquery.Client()
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV,
skip_leading_rows=1,
autodetect=True,
allow_quoted_newlines=True,
time_partitioning=bigquery.TimePartitioning()
)
path = "../some/dir/*"
files = glob.glob(path + '*')
for file_name in files:
date = file_name.split('_')[-1][0:8]
table_id = 'dataset.table_name$' + date #Partition specification
with open(file_name, "rb") as source_file:
job = client.load_table_from_file(
source_file,
table_id,
job_config=job_config
)
job.result() # Waits for the job to complete.
table = client.get_table(table_id) # Make an API request.
print(
"Loaded {} rows and {} columns to {}".format(
table.num_rows, len(table.schema), table_id
)
)
Load data from local data source (https://cloud.google.com/bigquery/docs/loading-data-local?hl=ja#loading_data_from_a_local_data_source)