AWS Certified Machine Learning – Specialty — Question 298
A data scientist uses Amazon SageMaker Data Wrangler to define and perform transformations and feature engineering on historical data. The data scientist saves the transformations to SageMaker Feature Store.
The historical data is periodically uploaded to an Amazon S3 bucket. The data scientist needs to transform the new historic data and add it to the online feature store. The data scientist needs to prepare the new historic data for training and inference by using native integrations.
Which solution will meet these requirements with the LEAST development effort?
Answer options
- A. Use AWS Lambda to run a predefined SageMaker pipeline to perform the transformations on each new dataset that arrives in the S3 bucket.
- B. Run an AWS Step Functions step and a predefined SageMaker pipeline to perform the transformations on each new dataset that arrives in the S3 bucket.
- C. Use Apache Airflow to orchestrate a set of predefined transformations on each new dataset that arrives in the S3 bucket.
- D. Configure Amazon EventBridge to run a predefined SageMaker pipeline to perform the transformations when a new data is detected in the S3 bucket.
Correct answer: D
Explanation
Amazon EventBridge natively integrates with Amazon S3 and SageMaker Pipelines, allowing a pipeline to be triggered directly when new data is uploaded with minimal configuration and zero custom code. In contrast, using AWS Lambda, AWS Step Functions, or Apache Airflow introduces unnecessary development overhead because they require writing, deploying, and maintaining custom orchestration logic or DAGs. Consequently, the EventBridge solution achieves the requirements with the least amount of development effort.