Convert files uploaded to Cloud Storage with Cloud Functions (Python) so that they are not garbled in Excel

background

The situation where you want to expand the query result of BigQuery to GCS in-house and view it in Excel. BigQuery-> GCS spits out the result in utf8, so when I open it in Excel, Japanese characters are garbled. Therefore, implement cloud functions that will be converted to utf8 with bom without permission when you put the file in the bucket.

Cloud Function with Python If you are touching it for the first time, read this area and touch it.

Python Quick Start First Function: Python

Functions that fire in Cloud Storage

You can create a function that fires when an object is created in Cloud Storage

Cloud Storage Tutorial #Finalize Objects

sample

A function that converts the file in the bucket to utf8 with bom and uploads it with bom_ added to the prefix.

main.py


from google.cloud import storage


def convert_to_bom(data, context):
    bucket_name = data['bucket']
    file_path = data['name']
    prefix = 'bom_'

    file_path_arr = file_path.split('/')
    file_name = file_path_arr[-1]

    if file_name.startswith(prefix):
        return 'skipping of bom file.'

    dir_arr = file_path_arr[:-1]
    dir_path = '/'.join(dir_arr) + '/'
    local_file_path = '/tmp/' + file_name

    if(len(file_path_arr) == 1):
        new_file_path = prefix + file_path
    else:
        new_file_path = dir_path + prefix + file_name

    client = storage.Client()
    bucket = client.get_bucket(bucket_name)
    dl_blob = bucket.get_blob(file_path)
    up_blob = bucket.blob(new_file_path)

    with open(local_file_path, 'w', newline='', encoding='utf_8_sig', errors='ignore') as f:
        f.write(dl_blob.download_as_string().decode('utf8'))

    up_blob.upload_from_filename(local_file_path)

    return 'success'

requirements.txt


-i https://pypi.org/simple
cachetools==4.1.0
certifi==2020.4.5.1
chardet==3.0.4
google-api-core==1.19.0
google-auth==1.16.1
google-cloud-core==1.3.0
google-cloud-storage==1.28.1
google-resumable-media==0.5.1
googleapis-common-protos==1.52.0
idna==2.9
protobuf==3.12.2
pyasn1-modules==0.2.8
pyasn1==0.4.8
pytz==2020.1
requests==2.23.0
rsa==4.0
six==1.15.0
urllib3==1.25.9

Deploy

gcloud functions deploy convert_to_bom --runtime python37 --trigger-resource ${YOUR_BUCKET} --trigger-event google.storage.object.finalize

Precautions when writing a file

Be careful as you cannot write to directories other than / tmp. When I try to write, the function crashes and dies quietly.

The only writable part of the file system is the / tmp directory. This directory can be used as a storage location for temporary files for function instances.

Cloud Functions execution environment # file system

reference

PythonClientforGoogleCloudStorage [GoogleCloudStorage] How to use GCS Python API [Note]

Recommended Posts

Convert files uploaded to Cloud Storage with Cloud Functions (Python) so that they are not garbled in Excel
Convert files uploaded to Cloud Storage with Cloud Functions (Python) so that they are not garbled in Excel
Get Google Cloud Storage object list in Java
Get files, functions, line numbers running in python
How to upload files to Cloud Storage with Firebase's python SDK
Convert files written in python etc. to pdf with syntax highlighting
Upload files to Aspera that comes with IBM Cloud Object Storage (ICOS) using SDK (Python version)
Convert HEIC files to PNG files with Python
Convert the spreadsheet to CSV and upload it to Cloud Storage with Cloud Functions
Convert FBX files to ASCII <-> BINARY in Python
Convert PDFs to images in bulk with Python
Convert UTF-8 CSV files to read in Excel
A solution to the problem that files containing [and] are not listed in glob.glob ()
How to convert / restore a string with [] in python
Convert the image in .zip to PDF with Python
Storage I / O notes in Python with Azure Functions
Cloud Functions to resize images using OpenCV with Cloud Storage triggers
Convert Excel file to text in Python for diff purposes
[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda
How to not escape Japanese when dealing with json in python
[GCP] How to publish Cloud Storage signed URLs (temporary URLs) in Python
Regular expressions that are easy and solid to learn in Python
Post a message from IBM Cloud Functions to Slack in Python
How to use functions in separate files Perl and Python versions
Upload and manage packages that are not in conda to anaconda.org
How to connect to Cloud Firestore from Google Cloud Functions with python code
Upload file to GCP's Cloud Storage (GCS) ~ Load with local Python
Convert markdown to PDF in Python
Convert list to DataFrame with python
Handle Excel CSV files with Python
Read files in parallel with Python
With PEP8 and PEP257, Python coding that is not embarrassing to show to people!
I made a script in python to convert .md files to Scrapbox format
Copy data from Amazon S3 to Google Cloud Storage with Python (boto)
How to deal with old Python versions in Cloud9 made by others