AWS Certified Machine Learning – Specialty — Question 42

A Machine Learning Specialist is preparing data for training on Amazon SageMaker. The Specialist is using one of the SageMaker built-in algorithms for the training. The dataset is stored in .CSV format and is transformed into a numpy.array, which appears to be negatively affecting the speed of the training.
What should the Specialist do to optimize the data for training on SageMaker?

Answer options

Correct answer: C

Explanation

Transforming the dataset into the RecordIO protobuf format is optimal for training on SageMaker because it is specifically designed for efficient data input. The other options either do not address the format issue directly or focus on different aspects of data handling that do not enhance training speed.