You developed a BigQuery ML linear regressor model by using a training dataset stored in…

Question

You developed a BigQuery ML linear regressor model by using a training dataset stored in a BigQuery table. New data is added to the table every minute. You are using Cloud Scheduler and Vertex AI Pipelines to automate hourly model training, and use the model for direct inference. The feature preprocessing logic includes quantile bucketization and MinMax scaling on data received in the last hour. You want to minimize storage and computational overhead. What should you do?

Accepted Answer

Correct answer: B. B. Use the TRANSFORM clause in the CREATE MODEL statement in the SQL query to calculate the required statistics. — The correct answer is B because using the TRANSFORM clause directly in the CREATE MODEL statement efficiently calculates the required statistics during model training, reducing the need for additional storage and processing steps. Options A and D unnecessarily complicate the process with extra steps for data staging and storage, while option C adds complexity by creating a separate component in the pipeline for statistics calculation, which is unnecessary when the TRANSFORM clause can handle it.

Google Cloud Professional Machine Learning Engineer — Question 281

Answer options

Correct answer: B

Explanation