A company has developed several AWS Glue jobs to validate and transform its data from Ama…

Question

A company has developed several AWS Glue jobs to validate and transform its data from Amazon S3 and load it into Amazon RDS for MySQL in batches once every day. The ETL jobs read the S3 data using a DynamicFrame. Currently, the ETL developers are experiencing challenges in processing only the incremental data on every run, as the AWS Glue job processes all the S3 input data on each run.
Which approach would allow the developers to solve the issue with minimal coding effort?

Accepted Answer

Correct answer: B. B. Enable job bookmarks on the AWS Glue jobs. — Enabling job bookmarks allows AWS Glue to keep track of which data has already been processed, thus facilitating the handling of only the new or incremental data in subsequent runs. The other options would require more complex coding efforts or do not address the problem effectively; for instance, switching to a DataFrame does not inherently provide incremental processing capabilities, while deleting processed data may not be feasible or desirable.

AWS Certified Data Analytics – Specialty — Question 29

Answer options

Correct answer: B

Explanation