Google Cloud Professional Machine Learning Engineer — Question 15
You have been asked to develop an input pipeline for an ML training model that processes images from disparate sources at a low latency. You discover that your input data does not fit in memory. How should you create a dataset following Google-recommended best practices?
Answer options
- A. Create a tf.data.Dataset.prefetch transformation.
- B. Convert the images to tf.Tensor objects, and then run Dataset.from_tensor_slices().
- C. Convert the images to tf.Tensor objects, and then run tf.data.Dataset.from_tensors().
- D. Convert the images into TFRecords, store the images in Cloud Storage, and then use the tf.data API to read the images for training.
Correct answer: D
Explanation
The correct answer is D because converting images into TFRecords and storing them in Cloud Storage allows for efficient data handling and retrieval, which is crucial for large datasets that cannot fit into memory. Options A, B, and C do not address the issue of memory constraints effectively and are not aligned with best practices for handling large-scale input data.