Google Cloud Professional Data Engineer — Question 12
You are building a data pipeline on Google Cloud. You need to prepare data using a casual method for a machine-learning process. You want to support a logistic regression model. You also need to monitor and adjust for null values, which must remain real-valued and cannot be removed. What should you do?
Answer options
- A. Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 'none' using a Cloud Dataproc job.
- B. Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 0 using a Cloud Dataprep job.
- C. Use Cloud Dataflow to find null values in sample source data. Convert all nulls to 'none' using a Cloud Dataprep job.
- D. Use Cloud Dataflow to find null values in sample source data. Convert all nulls to 0 using a custom script.
Correct answer: B
Explanation
The correct answer is B because converting null values to 0 maintains the integrity of the data for logistic regression, which can handle real-valued inputs. Options A and C incorrectly suggest converting nulls to 'none', which is not suitable for a real-valued model. Option D uses a custom script, which adds unnecessary complexity when Cloud Dataprep can achieve the required outcome.