Google Cloud Professional Data Engineer — Question 179

Your company's data platform ingests CSV file dumps of booking and user profile data from upstream sources into Cloud Storage. The data analyst team wants to join these datasets on the email field available in both the datasets to perform analysis. However, personally identifiable information (PII) should not be accessible to the analysts. You need to de-identify the email field in both the datasets before loading them into BigQuery for analysts. What should you do?

Answer options

Correct answer: B

Explanation

The correct answer is B because it uses format-preserving encryption with FFX, which allows the email data to be de-identified while maintaining its format for analysis. Option A uses masking, which does not preserve the format of the email and may not meet the analysis needs. Options C and D focus on dynamic data masking, which does not actually de-identify the data before loading it into BigQuery, potentially exposing PII to analysts.