Google Cloud Professional Machine Learning Engineer — Question 161

You work for a rapidly growing social media company. Your team builds TensorFlow recommender models in an on-premises CPU cluster. The data contains billions of historical user events and 100,000 categorical features. You notice that as the data increases, the model training time increases. You plan to move the models to Google Cloud. You want to use the most scalable approach that also minimizes training time. What should you do?

Answer options

Correct answer: A

Explanation

The correct answer is A because TPU VMs with TPUv3 Pod slices are specifically designed for high scalability and performance in training large models, significantly reducing training time. Option B, while scalable, may not provide the same efficiency as TPUs for this workload. Option C is not suitable for the complexity of the recommender model, and Option D, although powerful, does not match the scalability and optimization provided by TPU for this specific scenario.