AWS Certified Data Engineer – Associate (DEA-C01) — Question 93
A company has developed several AWS Glue extract, transform, and load (ETL) jobs to validate and transform data from Amazon S3. The ETL jobs load the data into Amazon RDS for MySQL in batches once every day. The ETL jobs use a DynamicFrame to read the S3 data.
The ETL jobs currently process all the data that is in the S3 bucket. However, the company wants the jobs to process only the daily incremental data.
Which solution will meet this requirement with the LEAST coding effort?
Answer options
- A. Create an ETL job that reads the S3 file status and logs the status in Amazon DynamoDB.
- B. Enable job bookmarks for the ETL jobs to update the state after a run to keep track of previously processed data.
- C. Enable job metrics for the ETL jobs to help keep track of processed objects in Amazon CloudWatch.
- D. Configure the ETL jobs to delete processed objects from Amazon S3 after each run.
Correct answer: B
Explanation
The correct answer is B because enabling job bookmarks allows the ETL jobs to track which data has been processed in previous runs, thus ensuring that only incremental data is processed in subsequent executions. The other options involve additional complexity or do not directly address the requirement to process only new data, making them less suitable solutions.