[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda

Configuration aimed at in this article

lamda-csv-to-json.png

** Upload CSV file to S3 → Start Lambda → Convert to JSON file **

Technology used

Language: Python 3.8 AWS: S3、Lambda

Preparation

First, prepare your IAM user, IAM role, S3 bucket, and more.

Create an IAM user

This time we will work with the AWS CLI, so we will create a dedicated IAM user.

スクリーンショット 2021-01-19 19.22.58.png

"IAM"-> "Users"-> "Add User"

Username: Optional Access type: Check "Programmatic access"

スクリーンショット 2021-01-19 19.28.13.png

This time, I want to perform basic operations related to S3 such as creating an S3 bucket, uploading and deleting files, so I will attach the "Amazon S3 Full Access" policy.

スクリーンショット 2021-01-19 19.30.55_censored.jpg

When the creation is completed

--Access key ID --Secret access key

Two of them will be issued, so make a note of them.

$ aws configure --profile s3-lambda

AWS Access Key ID [None]: ***************** #Enter your access key ID
AWS Secret Access Key [None]: ************************** #Enter your secret access key
Default region name [None]: ap-northeast-1
Default output format [None]: json

When you type the above command in the terminal, you will be asked for information interactively, so enter it while following the instructions.

Create an S3 bucket

I will create it using the AWS CLI that I set up earlier.

$ aws --profile s3-lambda s3 mb s3://test-bucket-for-converting-csv-to-json-with-lambda

make_bucket: test-bucket-for-converting-csv-to-json-with-lambda

Bucket names must be unique throughout the world, so think of your own.

** Create a test CSV file and upload it as a trial **

$ mkdir ./workspace/
$ cat > ./workspace/test.csv << EOF
heredoc> Name,Age,Country
heredoc> Taro,20,Japan
heredoc> EOF
$ aws --profile s3-lambda s3 sync ./workspace s3://test-bucket-for-converting-csv-to-json-with-lambda

upload: ./test.csv to s3://test-bucket-for-converting-csv-to-json-with-lambda/test.csv

スクリーンショット 2021-01-19 20.04.25.png

Success if it is properly in the bucket.

$ aws --profile s3-lambda s3 rm s3://test-bucket-for-converting-csv-to-json-with-lambda/test.csv

I have confirmed the operation, so I will delete it.

Create an IAM role

Create an IAM role to assign to Lambda.

スクリーンショット 2021-01-19 19.05.42.png

"IAM"-> "Role"-> "Create Role"

This time it is OK if you have the above two policies.

スクリーンショット 2021-01-19 19.10.42.png

Please enter a name and description as appropriate to create it.

Implementation

Now that the preparations have been completed, we will finally implement it from here.

Create a Lambda function

スクリーンショット 2021-01-19 20.12.45.png

"Lambda"-> "Create Function"

--Option: Create from scratch --Function name: Arbitrary --Runtime: Python 3.8 --Execution role: Existing role ("s3-lambda" created earlier) --Other: OK by default

Create a trigger

スクリーンショット 2021-01-19 20.19.09.png

Go to "Configuration"-> "Add Trigger" to decide what event will trigger Lambda.

スクリーンショット 2021-01-19 20.22.04.png

I will fill in the necessary items.

--Trigger: S3 --Bucket: The bucket name you created earlier --Event type: All object creation events --Prefix: input / --Suffix: .csv

This time, it is assumed that Lambda will be started after detecting that the ".csv" file has been uploaded under the folder "input".

code

import json
import csv
import boto3
import os
from datetime import datetime, timezone, timedelta

s3 = boto3.client('s3')

def lambda_handler(event, context):
    
    json_data = []
    
    #TZ changed to Japan
    JST = timezone(timedelta(hours=+9), 'JST')
    timestamp = datetime.now(JST).strftime('%Y%m%d%H%M%S')
    
    #Temporary read / write file (delete later)
    tmp_csv = '/tmp/test_{ts}.csv'.format(ts=timestamp)
    tmp_json = '/tmp/test_{ts}.json'.format(ts=timestamp)
    
    #Final output file
    outputted_json = 'output/test_{ts}.json'.format(ts=timestamp)

    for record in event['Records']:
        bucket_name = record['s3']['bucket']['name']
        key_name = record['s3']['object']['key']
    
    s3_object = s3.get_object(Bucket=bucket_name, Key=key_name)
    data = s3_object['Body'].read()
    contents = data.decode('utf-8')
    
    try:
        with open(tmp_csv, 'a') as csv_data:
            csv_data.write(contents)
        
        with open(tmp_csv) as csv_data:
            csv_reader = csv.DictReader(csv_data)
            for csv_row in csv_reader:
                json_data.append(csv_row)
                
        with open(tmp_json, 'w') as json_file:
            json_file.write(json.dumps(json_data))
        
        with open(tmp_json, 'r') as json_file_contents:
            response = s3.put_object(Bucket=bucket_name, Key=outputted_json, Body=json_file_contents.read())
    
        os.remove(tmp_csv)
        os.remove(tmp_json)
    
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

Now when the CSV file is uploaded to the S3 bucket name "test-bucket-for-converting-csv-to-json-with-lambda/input /", "test-bucket-for-converting-csv-to-json-" The file converted to JSON format will be spit out to "with-lambda/output /".

$ aws --profile s3-lambda s3 sync ./workspace s3://test-bucket-for-converting-csv-to-json-with-lambda/input

upload: ./test.csv to s3://test-bucket-for-converting-csv-to-json-with-lambda/input/test.csv

Let's upload the file again with the AWS CLI.

スクリーンショット 2021-01-19 20.39.29.png スクリーンショット 2021-01-19 20.39.59.png

If you check the bucket, a new folder called "output" should be created and the JSON file should be inside.

[
    {
        "Name": "Taro",
        "Age": "20",
        "Country": "Japan"
    }
]

Check the contents, and if it is converted to JSON format firmly, you are done.

Afterword

Thank you for your hard work. This time it was a conversion from CSV to JSON, but I think that other patterns can be realized in the same way.

I hope you find it helpful.

Reference article

Convert CSV to JSON files with AWS Lambda and S3 Events

Recommended Posts

[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda
How to convert JSON file to CSV file with Python Pandas
Connect to s3 with AWS Lambda Python
[Python] Write to csv file with Python
Output to csv file with Python
[Python] How to convert db file to csv
[Python] Convert csv file delimiters to tab delimiters
Convert Excel data to JSON with python
How to read a CSV file with Python 2/3
Convert svg file to png / ico with Python
Export RDS snapshot to S3 with Lambda (Python)
Read CSV file with Python and convert it to DataFrame as it is
Upload what you got in request to S3 with AWS Lambda Python
I convert AWS JSON data to CSV like this
I tried to touch the CSV file with Python
[AWS; Introduction to Lambda] 2nd; Extract sentences from json file and save S3 ♬
I want to AWS Lambda with Python on Mac!
Write to csv with Python
How to convert Json file to CSV format or EXCEL format
Python script to create a JSON file from a CSV file
Download csv file with python
[AWS] Try adding Python library to Layer with SAM + Lambda (Python)
Output CloudWatch Logs to S3 with AWS Lambda (Pythyon ver)
[Python] Regularly export from CloudWatch Logs to S3 with Lambda
Convert list to DataFrame with python
Notify HipChat with AWS Lambda (Python)
I tried uploading / downloading a file to AWS S3 / Azure BlobStorage / GCP CloudStorage with Python
Send images taken with ESP32-WROOM-32 to AWS (API Gateway → Lambda → S3)
After calling the Shell file on Python, convert CSV to Parquet.
[For Python] Quickly create an upload file to AWS Lambda Layer
[AWS] Using ini files with Lambda [Python]
Convert memo at once with Python 2to3
Read CSV file with python (Download & parse CSV file)
How to convert Python to an exe file
Convert from PDF to CSV with pdfplumber
Convert psd file to png in Python
Convert Hiragana to Romaji with Python (Beta)
Convert FX 1-minute data to 5-minute data with Python
I want to play with aws with python
Process the gzip file UNLOADed with Redshift with Python of Lambda, gzip it again and upload it to S3
[AWS] Link Lambda and S3 with boto3
Convert array (struct) to json with golang
[Part1] Scraping with Python → Organize to csv!
Convert HEIC files to PNG files with Python
Convert Chinese numerals to Arabic numerals with Python
[AWS] Do SSI-like things with S3 / Lambda
Python + Selenium + Headless Chromium with aws lambda
Sample to convert image to Wavelet with Python
ImportError when trying to use gcloud package with AWS Lambda Python version
Error due to UnicodeDecodeError when reading CSV file with Python [For beginners]
Preprocessing with Python. Convert Nico Nico Douga tag search results to CSV format
[AWS lambda] Deploy including various libraries with lambda (generate a zip with a password and upload it to s3) @ Python
Scraping tabelog with python and outputting to CSV
Convert PDF to image (JPEG / PNG) with Python
Convert PDFs to images in bulk with Python
[Python] Convert from DICOM to PNG or CSV
Read JSON with Python and output as CSV
How to create a JSON file in Python
[Python] How to read excel file with pandas
Writing logs to CSV file (Python, C language)
Convert Windows epoch values to date with python