Google Cloud Associate Data Practitioner — Question 11
You are working with a large dataset of customer reviews stored in Cloud Storage. The dataset contains several inconsistencies, such as missing values, incorrect data types, and duplicate entries. You need to clean the data to ensure that it is accurate and consistent before using it for analysis. What should you do?
Answer options
- A. Use the PythonOperator in Cloud Composer to clean the data and load it into BigQuery. Use SQL for analysis.
- B. Use BigQuery to batch load the data into BigQuery. Use SQL for cleaning and analysis.
- C. Use Storage Transfer Service to move the data to a different Cloud Storage bucket. Use event triggers to invoke Cloud Run functions to load the data into BigQuery. Use SQL for analysis.
- D. Use Cloud Run functions to clean the data and load it into BigQuery. Use SQL for analysis.
Correct answer: B
Explanation
The correct answer is B because BigQuery provides robust SQL capabilities for data cleaning and analysis directly, making it efficient to handle inconsistencies within the dataset. Options A, C, and D involve additional steps or tools that complicate the process without offering significant advantages for this specific task.