Google Cloud Professional Data Engineer — Question 148
You have a variety of files in Cloud Storage that your data science team wants to use in their models. Currently, users do not have a method to explore, cleanse, and validate the data in Cloud Storage. You are looking for a low code solution that can be used by your data science team to quickly cleanse and explore data within Cloud Storage. What should you do?
Answer options
- A. Provide the data science team access to Dataflow to create a pipeline to prepare and validate the raw data and load data into BigQuery for data exploration.
- B. Create an external table in BigQuery and use SQL to transform the data as necessary. Provide the data science team access to the external tables to explore the raw data.
- C. Load the data into BigQuery and use SQL to transform the data as necessary. Provide the data science team access to staging tables to explore the raw data.
- D. Provide the data science team access to Dataprep to prepare, validate, and explore the data within Cloud Storage.
Correct answer: D
Explanation
The correct answer is D because Dataprep is specifically designed for data preparation and offers a low-code solution that enables users to clean, validate, and explore data directly in Cloud Storage. Options A, B, and C involve more complex processes using Dataflow or SQL transformations in BigQuery, which do not meet the low-code requirement and may not be as efficient for the data science team's needs.