AWS Certified Machine Learning – Specialty — Question 261

A company hosts a machine learning (ML) dataset repository on Amazon S3. A data scientist is preparing the repository to train a model. The data scientist needs to redact personally identifiable information (PH) from the dataset.

Which solution will meet these requirements with the LEAST development effort?

Answer options

Correct answer: C

Explanation

The correct answer is C, as AWS Glue DataBrew is specifically designed for data preparation tasks, including identifying and redacting PII, with minimal coding required. Option A involves more development effort with custom transformations, while option B requires building a custom Lambda function, which is more complex. Option D also involves using a notebook and coding, making it less efficient than using DataBrew.