AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 109
An ML engineer needs to ensure that a dataset complies with regulations for personally identifiable information (PII). The ML engineer will use the data to train an ML model on Amazon SageMaker instances. SageMaker must not use any of the PII.
Which solution will meet these requirements in the MOST operationally efficient way?
Answer options
- A. Use the Amazon Comprehend DetectPiiEntities API call to redact the PII from the data. Store the data in an Amazon S3 bucket. Access the S3 bucket from the SageMaker instances for model training.
- B. Use the Amazon Comprehend DetectPiiEntities API call to redact the PII from the data. Store the data in an Amazon Elastic File System (Amazon EFS) file system. Mount the EFS file system to the SageMaker instances for model training.
- C. Use AWS Glue DataBrew to cleanse the dataset of PII. Store the data in an Amazon Elastic File System (Amazon EFS) file system. Mount the EFS file system to the SageMaker instances for model training.
- D. Use Amazon Macie for automatic discovery of PII in the data. Remove the PII. Store the data in an Amazon S3 bucket. Mount the S3 bucket to the SageMaker instances for model training.
Correct answer: A
Explanation
Option A is the most operationally efficient solution as it uses Amazon Comprehend DetectPiiEntities API to redact PII and stores the data in an S3 bucket, which is easily accessible by SageMaker. Options B and C introduce unnecessary complexity by using EFS, which is less efficient for this use case. Option D, while it addresses PII discovery, does not utilize the most straightforward method of data access for SageMaker training.