AWS Certified Data Engineer – Associate (DEA-C01) — Question 68

A data engineer must build an extract, transform, and load (ETL) pipeline to process and load data from 10 source systems into 10 tables that are in an Amazon Redshift database. All the source systems generate .csv, JSON, or Apache Parquet files every 15 minutes. The source systems all deliver files into one Amazon S3 bucket. The file sizes range from 10 MB to 20 GB. The ETL pipeline must function correctly despite changes to the data schema.
Which data pipeline solutions will meet these requirements? (Choose two.)

Answer options

Correct answer: B, D

Explanation

Options B and D provide a structured workflow that includes an AWS Glue crawler to adapt to schema changes before processing the data with an AWS Glue job. Option A lacks the flexibility of a crawler, which is essential for handling schema changes, while options C and E introduce unnecessary complexity with additional Lambda functions without improving the core ETL process.