Google Cloud Associate Data Practitioner — Question 16
You work for a healthcare company that has a large on-premises data system containing patient records with personally identifiable information (PII) such as names, addresses, and medical diagnoses. You need a standardized managed solution that de-identifies PII across all your data feeds prior to ingestion to Google Cloud. What should you do?
Answer options
- A. Use Cloud Run functions to create a serverless data cleaning pipeline. Store the cleaned data in BigQuery.
- B. Use Cloud Data Fusion to transform the data. Store the cleaned data in BigQuery.
- C. Load the data into BigQuery, and inspect the data by using SQL queries. Use Dataflow to transform the data and remove any errors.
- D. Use Apache Beam to read the data and perform the necessary cleaning and transformation operations. Store the cleaned data in BigQuery.
Correct answer: B
Explanation
The correct answer is B because Cloud Data Fusion provides a fully managed data integration service that allows for easy transformation of data, making it ideal for de-identifying PII before it is stored in BigQuery. Option A does not specifically address the need for data transformation, while option C involves inspecting data after ingestion rather than before, which does not meet the requirement. Option D, while useful for transformation, does not offer the same level of managed integration that Cloud Data Fusion provides.