AWS Certified Machine Learning – Specialty — Question 63

A Machine Learning Specialist is developing a daily ETL workflow containing multiple ETL jobs. The workflow consists of the following processes:
* Start the workflow as soon as data is uploaded to Amazon S3.
* When all the datasets are available in Amazon S3, start an ETL job to join the uploaded datasets with multiple terabyte-sized datasets already stored in Amazon
S3.
* Store the results of joining datasets in Amazon S3.
* If one of the jobs fails, send a notification to the Administrator.
Which configuration will meet these requirements?

Answer options

Correct answer: A

Explanation

Option A is correct because it effectively utilizes AWS Lambda to trigger a Step Functions workflow that can wait for all datasets to be uploaded, and then uses AWS Glue for the ETL job, which is suited for managing large datasets. The other options either do not meet the requirement for waiting on all datasets to be available, or they use services that are not optimal for this type of ETL process.