AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 109

An ML engineer needs to ensure that a dataset complies with regulations for personally identifiable information (PII). The ML engineer will use the data to train an ML model on Amazon SageMaker instances. SageMaker must not use any of the PII.

Which solution will meet these requirements in the MOST operationally efficient way?

Answer options

Correct answer: A

Explanation

Option A is the most operationally efficient solution as it uses Amazon Comprehend DetectPiiEntities API to redact PII and stores the data in an S3 bucket, which is easily accessible by SageMaker. Options B and C introduce unnecessary complexity by using EFS, which is less efficient for this use case. Option D, while it addresses PII discovery, does not utilize the most straightforward method of data access for SageMaker training.