A company hosts a machine learning (ML) dataset repository on Amazon S3. A data scientist…

Question

A company hosts a machine learning (ML) dataset repository on Amazon S3. A data scientist is preparing the repository to train a model. The data scientist needs to redact personally identifiable information (PH) from the dataset. Which solution will meet these requirements with the LEAST development effort?

Accepted Answer

Correct answer: C. C. Use AWS Glue DataBrew to identity and redact the PII — The correct answer is C, as AWS Glue DataBrew is specifically designed for data preparation tasks, including identifying and redacting PII, with minimal coding required. Option A involves more development effort with custom transformations, while option B requires building a custom Lambda function, which is more complex. Option D also involves using a notebook and coding, making it less efficient than using DataBrew.

AWS Certified Machine Learning – Specialty — Question 261

Answer options

Correct answer: C

Explanation