AWS Certified Machine Learning – Specialty — Question 198
A geospatial analysis company processes thousands of new satellite images each day to produce vessel detection data for commercial shipping. The company stores the training data in Amazon S3. The training data incrementally increases in size with new images each day.
The company has configured an Amazon SageMaker training job to use a single ml.p2.xlarge instance with File input mode to train the built-in Object Detection algorithm. The training process was successful last month but is now failing because of a lack of storage. Aside from the addition of training data, nothing has changed in the model training process.
A machine learning (ML) specialist needs to change the training configuration to fix the problem. The solution must optimize performance and must minimize the cost of training.
Which solution will meet these requirements?
Answer options
- A. Modify the training configuration to use two ml.p2.xlarge instances.
- B. Modify the training configuration to use Pipe input mode.
- C. Modify the training configuration to use a single ml.p3.2xlarge instance.
- D. Modify the training configuration to use Amazon Elastic File System (Amazon EFS) instead of Amazon S3 to store the input training data.
Correct answer: B
Explanation
The correct answer is B, as using Pipe input mode allows for streaming data directly from Amazon S3, minimizing the need for local storage and thus resolving the storage issue while maintaining performance. Options A and C increase costs by adding more or more powerful instances without addressing the storage problem. Option D, while providing additional storage, would not be as efficient as the Pipe input mode in this context.