Google Cloud Professional Data Engineer — Question 285

You work for a large real estate firm and are preparing 6 TB of home sales data to be used for machine learning. You will use SQL to transform the data and use
BigQuery ML to create a machine learning model. You plan to use the model for predictions against a raw dataset that has not been transformed. How should you set up your workflow in order to prevent skew at prediction time?

Answer options

Correct answer: A

Explanation

Option A is correct because it ensures that the preprocessing steps are consistently applied during model creation and prediction, which helps prevent skew. The other options either incorrectly suggest transforming the raw input data at prediction time (B and D) or do not maintain consistent preprocessing (C), leading to potential model inaccuracies.