AWS Certified Machine Learning – Specialty — Question 282

A data science team is working with a tabular dataset that the team stores in Amazon S3. The team wants to experiment with different feature transformations such as categorical feature encoding. Then the team wants to visualize the resulting distribution of the dataset. After the team finds an appropriate set of feature transformations, the team wants to automate the workflow for feature transformations.

Which solution will meet these requirements with the MOST operational efficiency?

Answer options

Correct answer: A

Explanation

Amazon SageMaker Data Wrangler provides an end-to-end, low-code interface designed specifically for feature engineering, built-in visualization, and direct export to SageMaker Pipelines for automated orchestration. Other options involve stitching together multiple separate services like AWS Lambda, AWS Step Functions, and Amazon QuickSight, which increases operational overhead and complexity. Therefore, using Data Wrangler's native features for transformation, visualization, and pipeline export offers the highest operational efficiency.