AWS Certified Data Engineer – Associate (DEA-C01) — Question 146
A data engineer configured an AWS Glue Data Catalog for data that is stored in Amazon S3 buckets. The data engineer needs to configure the Data Catalog to receive incremental updates.
The data engineer sets up event notifications for the S3 bucket and creates an Amazon Simple Queue Service (Amazon SQS) queue to receive the S3 events.
Which combination of steps should the data engineer take to meet these requirements with LEAST operational overhead? (Choose two.)
Answer options
- A. Create an S3 event-based AWS Glue crawler to consume events from the SQS queue.
- B. Define a time-based schedule to run the AWS Glue crawler, and perform incremental updates to the Data Catalog.
- C. Use an AWS Lambda function to directly update the Data Catalog based on S3 events that the SQS queue receives.
- D. Manually initiate the AWS Glue crawler to perform updates to the Data Catalog when there is a change in the S3 bucket.
- E. Use AWS Step Functions to orchestrate the process of updating the Data Catalog based on S3 events that the SQS queue receives.
Correct answer: A, B
Explanation
Option A is correct as it allows for an automated response to S3 events with minimal manual intervention by using an event-driven crawler. Option B is also correct as it provides a scheduled method for incremental updates. Options C, D, and E involve higher operational overhead compared to A and B, either requiring manual actions or additional services that complicate the process.