AWS Certified Data Engineer – Associate (DEA-C01) — Question 222
A data engineer is building a data pipeline. A large data file is uploaded to an Amazon S3 bucket once each day at unpredictable times. An AWS Glue workflow uses hundreds of workers to process the file and load the data into Amazon Redshift. The company wants to process the file as quickly as possible.
Which solution will meet these requirements?
Answer options
- A. Create an on-demand AWS Glue trigger to start the workflow. Create an AWS Lambda function that runs every 15 minutes to check the S3 bucket for the daily file. Configure the function to start the AWS Glue workflow if the file is present.
- B. Create an event-based AWS Glue trigger to start the workflow. Configure Amazon S3 to log events to AWS CloudTrail. Create a rule in Amazon EventBridge to forward PutObject events to the AWS Glue trigger.
- C. Create a scheduled AWS Glue trigger to start the workflow. Create a cron job that runs the AWS Glue job every 15 minutes. Set up the AWS Glue job to check the S3 bucket for the daily file. Configure the job to stop if the file is not present.
- D. Create an on-demand AWS Glue trigger to start the workflow. Create an AWS Database Migration Service (AWS DMS) migration task. Set the DMS source as the S3 bucket. Set the target endpoint as the AWS Glue workflow.
Correct answer: B
Explanation
The correct answer is B because using an event-based AWS Glue trigger allows the workflow to start immediately when a file is uploaded to the S3 bucket, ensuring swift processing. Option A relies on a Lambda function that checks periodically, which could introduce delays. Option C uses a scheduled approach that may not react quickly enough to file uploads. Option D incorrectly involves AWS DMS, which is not necessary for this workflow.