An ML engineer needs to build a processing pipeline to identify and remove personally ide…

Question

An ML engineer needs to build a processing pipeline to identify and remove personally identifiable information (PII) from petabytes of unstructured data. The ML engineer will use the processed data to train ML models in Amazon SageMaker AI. Which solution will meet these requirements?

Accepted Answer

Correct answer: A. A. Use the Apache Spark-based serverless engine from AWS Glue interactive sessions. Use the Detect PII transform feature to identify and remove the PII data. — Option A is correct because it specifically mentions using the Apache Spark-based serverless engine from AWS Glue, which is designed for processing large datasets and includes a feature for detecting and transforming PII data. The other options, while they may offer some functionality for PII detection, do not provide the same level of integration and scalability necessary for working with petabytes of unstructured data.

AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 209

Answer options

Correct answer: A

Explanation