Google Cloud Associate Data Practitioner — Question 33
You are a data analyst at your organization. You have been given a BigQuery dataset that includes customer information. The dataset contains inconsistencies and errors, such as missing values, duplicates, and formatting issues. You need to effectively and quickly clean the data. What should you do?
Answer options
- A. Develop a Dataflow pipeline to read the data from BigQuery, perform data quality rules and transformations, and write the cleaned data back to BigQuery.
- B. Use Cloud Data Fusion to create a data pipeline to read the data from BigQuery, perform data quality transformations, and write the clean data back to BigQuery.
- C. Export the data from BigQuery to CSV files. Resolve the errors using a spreadsheet editor, and re-import the cleaned data into BigQuery.
- D. Use BigQuery's built-in functions to perform data quality transformations.
Correct answer: D
Explanation
The correct answer is D because BigQuery's built-in functions allow for efficient and direct data cleaning within the database, minimizing the need for external processing. Options A and B involve additional tools and steps that may complicate the cleaning process. Option C requires exporting and manually editing the data, which can be time-consuming and prone to errors.