A data engineer must build an extract, transform, and load (ETL) pipeline to process and…

Question

A data engineer must build an extract, transform, and load (ETL) pipeline to process and load data from 10 source systems into 10 tables that are in an Amazon Redshift database. All the source systems generate .csv, JSON, or Apache Parquet files every 15 minutes. The source systems all deliver files into one Amazon S3 bucket. The file sizes range from 10 MB to 20 GB. The ETL pipeline must function correctly despite changes to the data schema.
Which data pipeline solutions will meet these requirements? (Choose two.)

Accepted Answer

Correct answer: B, D. B. Use an Amazon EventBridge rule to invoke an AWS Glue workflow job every 15 minutes. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables. — D. Configure an AWS Lambda function to invoke an AWS Glue workflow when a file is loaded into the S3 bucket. Configure the AWS Glue workflow to have an on-demand trigger that runs an AWS Glue crawler and then runs an AWS Glue job when the crawler finishes running successfully. Configure the AWS Glue job to process and load the data into the Amazon Redshift tables. — Options B and D provide a structured workflow that includes an AWS Glue crawler to adapt to schema changes before processing the data with an AWS Glue job. Option A lacks the flexibility of a crawler, which is essential for handling schema changes, while options C and E introduce unnecessary complexity with additional Lambda functions without improving the core ETL process.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 68

Answer options

Correct answer: B, D

Explanation