AWS Certified Data Analytics – Specialty — Question 29
A company has developed several AWS Glue jobs to validate and transform its data from Amazon S3 and load it into Amazon RDS for MySQL in batches once every day. The ETL jobs read the S3 data using a DynamicFrame. Currently, the ETL developers are experiencing challenges in processing only the incremental data on every run, as the AWS Glue job processes all the S3 input data on each run.
Which approach would allow the developers to solve the issue with minimal coding effort?
Answer options
- A. Have the ETL jobs read the data from Amazon S3 using a DataFrame.
- B. Enable job bookmarks on the AWS Glue jobs.
- C. Create custom logic on the ETL jobs to track the processed S3 objects.
- D. Have the ETL jobs delete the processed objects or data from Amazon S3 after each run.
Correct answer: B
Explanation
Enabling job bookmarks allows AWS Glue to keep track of which data has already been processed, thus facilitating the handling of only the new or incremental data in subsequent runs. The other options would require more complex coding efforts or do not address the problem effectively; for instance, switching to a DataFrame does not inherently provide incremental processing capabilities, while deleting processed data may not be feasible or desirable.