A Machine Learning Specialist is developing a daily ETL workflow containing multiple ETL…

Question

A Machine Learning Specialist is developing a daily ETL workflow containing multiple ETL jobs. The workflow consists of the following processes:
* Start the workflow as soon as data is uploaded to Amazon S3.
* When all the datasets are available in Amazon S3, start an ETL job to join the uploaded datasets with multiple terabyte-sized datasets already stored in Amazon
S3.
* Store the results of joining datasets in Amazon S3.
* If one of the jobs fails, send a notification to the Administrator.
Which configuration will meet these requirements?

Accepted Answer

Correct answer: A. A. Use AWS Lambda to trigger an AWS Step Functions workflow to wait for dataset uploads to complete in Amazon S3. Use AWS Glue to join the datasets. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure. — Option A is correct because it effectively utilizes AWS Lambda to trigger a Step Functions workflow that can wait for all datasets to be uploaded, and then uses AWS Glue for the ETL job, which is suited for managing large datasets. The other options either do not meet the requirement for waiting on all datasets to be available, or they use services that are not optimal for this type of ETL process.

AWS Certified Machine Learning – Specialty — Question 63

Answer options

Correct answer: A

Explanation