AWS Certified Solutions Architect – Associate (SAA-C03) — Question 739
A company's marketing data is uploaded from multiple sources to an Amazon S3 bucket. A series of data preparation jobs aggregate the data for reporting. The data preparation jobs need to run at regular intervals in parallel. A few jobs need to run in a specific order later.
The company wants to remove the operational overhead of job error handling, retry logic, and state management.
Which solution will meet these requirements?
Answer options
- A. Use an AWS Lambda function to process the data as soon as the data is uploaded to the S3 bucket. Invoke other Lambda functions at regularly scheduled intervals.
- B. Use Amazon Athena to process the data. Use Amazon EventBridge Scheduler to invoke Athena on a regular internal.
- C. Use AWS Glue DataBrew to process the data. Use an AWS Step Functions state machine to run the DataBrew data preparation jobs.
- D. Use AWS Data Pipeline to process the data. Schedule Data Pipeline to process the data once at midnight.
Correct answer: C
Explanation
AWS Step Functions is a serverless orchestration service that natively manages state, handles errors, and executes retry logic, making it ideal for coordinating complex parallel and sequential workflows. AWS Glue DataBrew simplifies the data preparation process without requiring complex infrastructure management. Other options, like using Lambda or Athena with EventBridge, would require custom coding for state management and error handling, thereby increasing operational overhead.