AWS Certified Machine Learning – Specialty — Question 261
A company hosts a machine learning (ML) dataset repository on Amazon S3. A data scientist is preparing the repository to train a model. The data scientist needs to redact personally identifiable information (PH) from the dataset.
Which solution will meet these requirements with the LEAST development effort?
Answer options
- A. Use Amazon SageMaker Data Wrangler with a custom transformation to identify and redact the PII.
- B. Create a custom AWS Lambda function to read the files, identify the PII. and redact the PII
- C. Use AWS Glue DataBrew to identity and redact the PII
- D. Use an AWS Glue development endpoint to implement the PII redaction from within a notebook
Correct answer: C
Explanation
The correct answer is C, as AWS Glue DataBrew is specifically designed for data preparation tasks, including identifying and redacting PII, with minimal coding required. Option A involves more development effort with custom transformations, while option B requires building a custom Lambda function, which is more complex. Option D also involves using a notebook and coding, making it less efficient than using DataBrew.