AWS Certified Machine Learning – Specialty — Question 232
A company is training machine learning (ML) models on Amazon SageMaker by using 200 TB of data that is stored in Amazon S3 buckets. The training data consists of individual files that are each larger than 200 MB in size. The company needs a data access solution that offers the shortest processing time and the least amount of setup.
Which solution will meet these requirements?
Answer options
- A. Use File mode in SageMaker to copy the dataset from the S3 buckets to the ML instance storage.
- B. Create an Amazon FSx for Lustre file system. Link the file system to the S3 buckets.
- C. Create an Amazon Elastic File System (Amazon EFS) file system. Mount the file system to the training instances.
- D. Use FastFile mode in SageMaker to stream the files on demand from the S3 buckets.
Correct answer: D
Explanation
The correct choice, D, leverages FastFile mode in SageMaker, which allows for on-demand streaming of files, thus optimizing access time and reducing setup complexity. Option A involves copying large datasets to local storage, which can be time-consuming. Option B, while efficient, requires additional setup and management of an FSx file system. Option C also demands setup of an EFS, which may not be as efficient for the given use case as streaming directly from S3.