AWS Certified Solutions Architect – Associate (SAA-C03) — Question 99
A company has an AWS Glue extract, transform, and load (ETL) job that runs every day at the same time. The job processes XML data that is in an Amazon S3 bucket. New data is added to the S3 bucket every day. A solutions architect notices that AWS Glue is processing all the data during each run.
What should the solutions architect do to prevent AWS Glue from reprocessing old data?
Answer options
- A. Edit the job to use job bookmarks.
- B. Edit the job to delete data after the data is processed.
- C. Edit the job by setting the NumberOfWorkers field to 1.
- D. Use a FindMatches machine learning (ML) transform.
Correct answer: A
Explanation
The correct answer is A, as job bookmarks allow AWS Glue to keep track of which data has already been processed, thus preventing the reprocessing of old data. Options B and C do not address the issue of reprocessing old data, and option D is unrelated to managing data processing states.