AWS Certified Data Engineer – Associate (DEA-C01) — Question 243
A data engineer at a company is optimizing extract, transform, and load (ETL) workflows. The current architecture uses Amazon EMR and Apache Spark for large-scale transformations and AWS Glue for other ETL tasks. The workflows load processed data into an Amazon S3 based data lake.
The company wants to move to a fully managed serverless solution that can orchestrate multiple ETL jobs and automate execution. The new solution must continue to use Spark to process data. The company needs to orchestrate and automate the ETL workflows with minimal manual intervention.
Which solution will meet these requirements?
Answer options
- A. Migrate all ETL jobs to AWS Glue. Use AWS Glue workflows to orchestrate the pipeline.
- B. Configure AWS Step Functions and Amazon EventBridge to orchestrate and invoke ETL workflows in AWS Glue and Amazon EMR.
- C. Configure AWS Lambda functions to process Amazon S3 event notifications for data transformation tasks when new data is uploaded.
- D. Use Amazon Managed Workflows for Apache Airflow automatic scheduling to orchestrate the Spark-based ETL jobs.
Correct answer: A
Explanation
The correct answer is A because AWS Glue is a fully managed serverless ETL service that can orchestrate and automate workflows efficiently, which aligns with the company's requirements. Option B, while it integrates multiple services, does not provide a fully managed serverless solution focused solely on ETL. Option C focuses on event-driven processing rather than orchestration, and Option D, although valid for scheduling, does not meet the criteria of being entirely serverless for ETL orchestration.