Google Cloud Professional Machine Learning Engineer — Question 124
You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance. What should you do?
Answer options
- A. Increase the instance memory to 512 GB and increase the batch size.
- B. Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job.
- C. Enable early stopping in your Vertex AI Training job.
- D. Use the tf.distribute.Strategy API and run a distributed training job.
Correct answer: B
Explanation
Option B is correct because a v3-32 TPU is optimized for high-performance model training and can significantly speed up the training process compared to the NVIDIA P100 GPU. The other options, while they may improve performance in some contexts, do not directly address the need for faster training times as effectively as switching to a TPU.