[PYTHON] Overwrite data in RDS with AWS Glue

When writing data to RDS using DynamicFrame in a Glue job Since it is written by append, data will be duplicated if the same job is run.

Converting a DynamicFrame to a DataFrame allows you to write in overwrite mode.

Add the following to the code of the automatically generated job. It is assumed that the JDBC connection definition is prepared.

#datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = dropnullfields3, catalog_connection = "MyConnection", connection_options = {"dbtable": "my_table", "database": "my_database"}, transformation_ctx = "datasink4")

#Get JDBC information from the connection definition
jdbc_conf = glueContext.extract_jdbc_conf(connection_name='MyConnection')

#Convert Dynamic Frame to Data Frame
df = dropnullfields3.toDF()

#Write DataFrame to table (overwrite mode)
df.write \
    .format("jdbc") \
    .option("url", jdbc_conf['url']) \
    .option("dbtable", "my_database.my_table") \
    .option("user", jdbc_conf['user']) \
    .option("password", jdbc_conf['password']) \
    .mode("overwrite") \
    .save()

job.commit()

In the example, I'm writing S3 data to Aurora Serverless MySQL, but I was able to overwrite it.

Recommended Posts

Overwrite data in RDS with AWS Glue
Manage your data with AWS RDS
Get additional data in LDAP with python
Try working with binary data in Python
RDS data via stepping stones in Pandas
Working with 3D data structures in pandas
Visualize corona infection data in Tokyo with matplotlib
Delete data in a pattern with Redis Cluster
Read table data in PDF file with Python
Glue Studio [AWS]
Easily log in to AWS with multiple accounts
Getting started with AWS IoT easily in Python
Train MNIST data with a neural network in PyTorch
Get Amazon RDS (PostgreSQL) data using SQL with pandas
Create Amazon Linux with AWS EC2 and log in
Sort post data in reverse order with Django's ListView
Build AWS EC2 and RDS with Terraform Terraform 3 minutes cooking
Delete DynamoDB data after 5 minutes with AWS Step Functions
Data analysis with python 2
Sampling in imbalanced data
Visualize data with Streamlit
[blackbird-rds] Monitoring AWS RDS
Reading data with TensorFlow
Data visualization with pandas
Data manipulation with Pandas!
Data Augmentation with openCV
AWS CDK with Python
Normarize data with Scipy
Data analysis with Python
LOAD DATA with PyMysql
Try scraping the data of COVID-19 in Tokyo with Python
Ingenuity to handle data with Pandas in a memory-saving manner
[AWS] How to deal with "Invalid codepoint" error in CloudSearch
Try to get data while port forwarding to RDS with anaconda.
Getting Started with Flask # 2: Displaying Data Frames in Style Sheets