An ML engineer needs to ensure that a dataset complies with regulations for personally id…

Question

An ML engineer needs to ensure that a dataset complies with regulations for personally identifiable information (PII). The ML engineer will use the data to train an ML model on Amazon SageMaker instances. SageMaker must not use any of the PII. Which solution will meet these requirements in the MOST operationally efficient way?

Accepted Answer

Correct answer: A. A. Use the Amazon Comprehend DetectPiiEntities API call to redact the PII from the data. Store the data in an Amazon S3 bucket. Access the S3 bucket from the SageMaker instances for model training. — Option A is the most operationally efficient solution as it uses Amazon Comprehend DetectPiiEntities API to redact PII and stores the data in an S3 bucket, which is easily accessible by SageMaker. Options B and C introduce unnecessary complexity by using EFS, which is less efficient for this use case. Option D, while it addresses PII discovery, does not utilize the most straightforward method of data access for SageMaker training.

AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 109

Answer options

Correct answer: A

Explanation