You are training an ML model using data stored in BigQuery that contains several values t…

Question

You are training an ML model using data stored in BigQuery that contains several values that are considered Personally Identifiable Information (PII). You need to reduce the sensitivity of the dataset before training your model. Every column is critical to your model. How should you proceed?

Accepted Answer

Correct answer: B. B. Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to encrypt sensitive values with Format Preserving Encryption. — The correct answer is B because it effectively identifies sensitive data and uses Format Preserving Encryption to encrypt those values while maintaining the structure necessary for the model. Option A does not ensure data protection as randomization may still expose PII patterns, option C replaces sensitive data but does not preserve its format, and option D eliminates sensitive data columns entirely, which is not suitable when all columns are critical for the model.

Google Cloud Professional Machine Learning Engineer — Question 140

Answer options

Correct answer: B

Explanation