Google Cloud Professional Data Engineer — Question 292

You are building a real-time prediction engine that streams files, which may contain PII (personal identifiable information) data, into Cloud Storage and eventually into BigQuery. You want to ensure that the sensitive data is masked but still maintains referential integrity, because names and emails are often used as join keys.
How should you use the Cloud Data Loss Prevention API (DLP API) to ensure that the PII data is not accessible by unauthorized individuals?

Answer options

Correct answer: D

Explanation

The correct answer is D because using a cryptographic format-preserving token allows for the masking of PII data while retaining the data's structure, essential for maintaining referential integrity. Option A is incorrect because storing non-tokenized data in any form poses a security risk. Option B does not maintain referential integrity since redacting all PII means losing essential join keys. Option C is not comprehensive, as scanning tables in BigQuery alone does not proactively protect PII as data is streamed.