Google Cloud Professional Data Engineer — Question 279

You are working on a linear regression model on BigQuery ML to predict a customer's likelihood of purchasing your company's products. Your model uses a city name variable as a key predictive component. In order to train and serve the model, your data must be organized in columns. You want to prepare your data using the least amount of coding while maintaining the predictable variables. What should you do?

Answer options

Correct answer: B

Explanation

The correct answer is B because one-hot encoding transforms categorical variables into a format that can be provided to ML algorithms to improve predictions. Option A fails to include necessary data, while option C involves additional complexity without addressing the columnar data requirement. Option D introduces an arbitrary numerical representation, which may not effectively capture the predictive relationships needed.