AWS Certified Machine Learning – Specialty — Question 232

A company is training machine learning (ML) models on Amazon SageMaker by using 200 TB of data that is stored in Amazon S3 buckets. The training data consists of individual files that are each larger than 200 MB in size. The company needs a data access solution that offers the shortest processing time and the least amount of setup.

Which solution will meet these requirements?

Answer options

Correct answer: D

Explanation

The correct choice, D, leverages FastFile mode in SageMaker, which allows for on-demand streaming of files, thus optimizing access time and reducing setup complexity. Option A involves copying large datasets to local storage, which can be time-consuming. Option B, while efficient, requires additional setup and management of an FSx file system. Option C also demands setup of an EFS, which may not be as efficient for the given use case as streaming directly from S3.