When writing data to RDS using DynamicFrame in a Glue job Since it is written by append, data will be duplicated if the same job is run.
Converting a DynamicFrame to a DataFrame allows you to write in overwrite mode.
Add the following to the code of the automatically generated job. It is assumed that the JDBC connection definition is prepared.
#datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = dropnullfields3, catalog_connection = "MyConnection", connection_options = {"dbtable": "my_table", "database": "my_database"}, transformation_ctx = "datasink4")
#Get JDBC information from the connection definition
jdbc_conf = glueContext.extract_jdbc_conf(connection_name='MyConnection')
#Convert Dynamic Frame to Data Frame
df = dropnullfields3.toDF()
#Write DataFrame to table (overwrite mode)
df.write \
.format("jdbc") \
.option("url", jdbc_conf['url']) \
.option("dbtable", "my_database.my_table") \
.option("user", jdbc_conf['user']) \
.option("password", jdbc_conf['password']) \
.mode("overwrite") \
.save()
job.commit()
In the example, I'm writing S3 data to Aurora Serverless MySQL, but I was able to overwrite it.
Recommended Posts