AWS Certified Data Engineer – Associate (DEA-C01) — Question 11
A data engineer is building a data pipeline on AWS by using AWS Glue extract, transform, and load (ETL) jobs. The data engineer needs to process data from Amazon RDS and MongoDB, perform transformations, and load the transformed data into Amazon Redshift for analytics. The data updates must occur every hour.
Which combination of tasks will meet these requirements with the LEAST operational overhead? (Choose two.)
Answer options
- A. Configure AWS Glue triggers to run the ETL jobs every hour.
- B. Use AWS Glue DataBrew to clean and prepare the data for analytics.
- C. Use AWS Lambda functions to schedule and run the ETL jobs every hour.
- D. Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift.
- E. Use the Redshift Data API to load transformed data into Amazon Redshift.
Correct answer: A, D
Explanation
The correct choices are A and D because configuring AWS Glue triggers allows for automatic execution of ETL jobs at specified intervals, minimizing manual effort, while AWS Glue connections ensure seamless integration between data sources and Amazon Redshift. Options B and C introduce unnecessary complexity; DataBrew is not essential for this pipeline, and using Lambda for scheduling increases operational overhead compared to Glue triggers. Option E, while valid, does not address the need for connectivity between the data sources and Redshift.