Google Cloud Professional Machine Learning Engineer — Question 291
You work for a bank. You need to train a model by using unstructured data stored in Cloud Storage that predicts whether credit card transactions are fraudulent. The data needs to be converted to a structured format to facilitate analysis in BigQuery. Company policy requires that data containing personally identifiable information (PII) remain in Cloud Storage. You need to implement a scalable solution that preserves the data’s value for analysis. What should you do?
Answer options
- A. Use BigQuery’s authorized views and column-level access controls to restrict access to PII within the dataset.
- B. Use the DLP API to de-identify the sensitive data before loading it into BigQuery.
- C. Store the unstructured data in a separate PII-compliant BigQuery database.
- D. Remove the sensitive data from the files manually before loading them into BigQuery.
Correct answer: B
Explanation
The correct answer is B because using the DLP API to de-identify sensitive data allows for the safe transfer of non-PII data into BigQuery while adhering to company policy. Option A does not remove PII from the dataset, and options C and D either violate PII policies or are inefficient solutions.