AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 98
A company receives daily .csv files about customer interactions with its ML model. The company stores the files in Amazon S3 and uses the files to retrain the model. An ML engineer needs to implement a solution to mask credit card numbers in the files before the model is retrained.
Which solution will meet this requirement with the LEAST development effort?
Answer options
- A. Create a discovery job in Amazon Macie. Configure the job to find and mask sensitive data.
- B. Create Apache Spark code to run on an AWS Glue job. Use the Sensitive Data Detection functionality in AWS Glue to find and mask sensitive data.
- C. Create Apache Spark code to run on an AWS Glue job. Program the code to perform a regex operation to find and mask sensitive data.
- D. Create Apache Spark code to run on an Amazon EC2 instance. Program the code to perform an operation to find and mask sensitive data.
Correct answer: B
Explanation
Option B is the best choice because it utilizes AWS Glue's built-in Sensitive Data Detection functionality, requiring minimal development effort to implement the masking of credit card numbers. Options A and D involve creating more complex solutions without leveraging existing AWS services designed for this task. Option C, while similar to B, requires custom regex coding, which introduces more complexity and development time.