Google Cloud Professional Cloud Security Engineer — Question 173
Your organization is developing a sophisticated machine learning (ML) model to predict customer behavior for targeted marketing campaigns. The BigQuery dataset used for training includes sensitive personal information. You must design the security controls around the AI/ML pipeline. Data privacy must be maintained throughout the model’s lifecycle and you must ensure that personal data is not used in the training process. Additionally, you must restrict access to the dataset to an authorized subset of people only. What should you do?
Answer options
- A. De-identify sensitive data before model training by using Cloud Data Loss Prevention (DLP)APIs. and implement strict Identity and Access Management (IAM) policies to control access to BigQuery.
- B. Implement Identity-Aware Proxy to enforce context-aware access to BigQuery and models based on user identity and device.
- C. Implement at-rest encryption by using customer-managed encryption keys (CMEK) for the pipeline. Implement strict Identity and Access Management (IAM) policies to control access to BigQuery.
- D. Deploy the model on Confidential VMs for enhanced protection of data and code while in use. Implement strict Identity and Access Management (IAM) policies to control access to BigQuery.
Correct answer: A
Explanation
The correct answer is A because de-identifying sensitive data with Cloud DLP APIs ensures that personal information is not included in the training process, thus maintaining data privacy. While options B, C, and D provide security measures, they do not specifically address the need to exclude personal data from the training dataset.