Google Cloud Professional Machine Learning Engineer — Question 100
You work on a data science team at a bank and are creating an ML model to predict loan default risk. You have collected and cleaned hundreds of millions of records worth of training data in a BigQuery table, and you now want to develop and compare multiple models on this data using TensorFlow and Vertex AI. You want to minimize any bottlenecks during the data ingestion state while considering scalability. What should you do?
Answer options
- A. Use the BigQuery client library to load data into a dataframe, and use tf.data.Dataset.from_tensor_slices() to read it.
- B. Export data to CSV files in Cloud Storage, and use tf.data.TextLineDataset() to read them.
- C. Convert the data into TFRecords, and use tf.data.TFRecordDataset() to read them.
- D. Use TensorFlow I/O’s BigQuery Reader to directly read the data.
Correct answer: D
Explanation
The correct answer is D because using TensorFlow I/O’s BigQuery Reader allows direct access to the data in BigQuery, which minimizes bottlenecks and enhances scalability. The other options involve additional steps like exporting data to CSV or converting to different formats, which could introduce delays and complexity in the data ingestion process.