AWS Certified Machine Learning – Specialty — Question 326
A data scientist is implementing a deep learning neural network model for an object detection task on images. The data scientist wants to experiment with a large number of parallel hyperparameter tuning jobs to find hyperparameters that optimize compute time.
The data scientist must ensure that jobs that underperform are stopped. The data scientist must allocate computational resources to well-performing hyperparameter configurations. The data scientist is using the hyperparameter tuning job to tune the stochastic gradient descent (SGD) learning rate, momentum, epoch, and mini-batch size.
Which technique will meet these requirements with LEAST computational time?
Answer options
- A. Grid search
- B. Random search
- C. Bayesian optimization
- D. Hyperband
Correct answer: D
Explanation
Hyperband uses a variation of successive halving to aggressively stop underperforming trials early and allocate more computational resources to promising configurations, making it highly efficient for training deep learning models. In contrast, Grid search and Random search lack early-stopping capabilities and evaluate all configurations fully, leading to high resource waste. While Bayesian optimization is efficient, it typically runs sequentially and does not natively offer the same rapid parallel early-stopping benefits as Hyperband to minimize total compute time.