Google Cloud Professional Data Engineer — Question 237
You are preparing an organization-wide dataset. You need to preprocess customer data stored in a restricted bucket in Cloud Storage. The data will be used to create consumer analyses. You need to follow data privacy requirements, including protecting certain sensitive data elements, while also retaining all of the data for potential future use cases. What should you do?
Answer options
- A. Use the Cloud Data Loss Prevention API and Dataflow to detect and remove sensitive fields from the data in Cloud Storage. Write the filtered data in BigQuery.
- B. Use customer-managed encryption keys (CMEK) to directly encrypt the data in Cloud Storage. Use federated queries from BigQuery. Share the encryption key by following the principle of least privilege.
- C. Use Dataflow and the Cloud Data Loss Prevention API to mask sensitive data. Write the processed data in BigQuery.
- D. Use Dataflow and Cloud KMS to encrypt sensitive fields and write the encrypted data in BigQuery. Share the encryption key by following the principle of least privilege.
Correct answer: C
Explanation
The correct answer is C, as it effectively uses Dataflow and the Cloud Data Loss Prevention API to mask sensitive data while ensuring the processed data is available in BigQuery for analysis. Option A incorrectly suggests removing sensitive fields entirely, which doesn't retain the data for future use. Option B focuses on encryption without addressing the need to mask sensitive data. Option D also encrypts data but fails to account for masking it, thereby not aligning with the data privacy requirements.