AWS Certified Machine Learning – Specialty — Question 24

A Machine Learning Specialist is developing a custom video recommendation model for an application. The dataset used to train this model is very large with millions of data points and is hosted in an Amazon S3 bucket. The Specialist wants to avoid loading all of this data onto an Amazon SageMaker notebook instance because it would take hours to move and will exceed the attached 5 GB Amazon EBS volume on the notebook instance.
Which approach allows the Specialist to use all the data to train the model?

Answer options

Correct answer: A

Explanation

The correct answer is A because it allows the Specialist to first verify the training code with a smaller dataset locally and then leverage the full dataset directly from S3 using Pipe input mode, avoiding local storage limitations. Options B and D involve unnecessary intermediate steps of using an EC2 instance, while C incorrectly suggests using AWS Glue, which is not intended for model training directly.