AWS Certified Machine Learning – Specialty — Question 40

A company's Machine Learning Specialist needs to improve the training speed of a time-series forecasting model using TensorFlow. The training is currently implemented on a single-GPU machine and takes approximately 23 hours to complete. The training needs to be run daily.
The model accuracy is acceptable, but the company anticipates a continuous increase in the size of the training data and a need to update the model on an hourly, rather than a daily, basis. The company also wants to minimize coding effort and infrastructure changes.
What should the Machine Learning Specialist do to the training solution to allow it to scale for future demand?

Answer options

Correct answer: B

Explanation

The correct answer is B because implementing a Horovod distributed framework in TensorFlow allows the model training to be parallelized across multiple machines, significantly improving training speed and scalability. Option A does not address the need for scaling with increased data, while options C and D suggest using different approaches that may not align with minimizing coding effort and infrastructure changes as effectively as option B.