Google Cloud Professional Data Engineer — Question 254
You are developing a fraud detection model using BigQuery ML. You have a raw transaction dataset and need to create new features such as the average_transaction_amount_last_24_hours and time_since_last_transaction. These features require aggregation and time-window calculations on the existing data. The goal is to ensure that these features are consistently applied during both model training and prediction without manual intervention. You need to prepare these features efficiently for your model. What should you do?
Answer options
- A. Implement a Cloud Run function that triggers on new transactions, calculates the features, and inserts them into a feature store before model serving.
- B. Export the BigQuery data to Cloud Storage, perform feature engineering using a custom Python script in a Dataflow job, and then re-import the engineered features into BigQuery.
- C. Use the TRANSFORM clause within the CREATE MODEL statement, leveraging SQL functions for aggregations and time-based calculations.
- D. Create a separate BigQuery table containing pre-computed features using complex SQL queries and join this table with the raw data during model training and serving.
Correct answer: D
Explanation
The correct answer is D because creating a separate BigQuery table with pre-computed features allows for efficient and consistent access during both training and serving phases. Option A involves additional complexity and potential delays with Cloud Run. Option B adds overhead from exporting and processing data externally, which is less efficient. Option C does not provide a way to consistently apply the features without manual intervention.