AWS Certified Machine Learning – Specialty — Question 42
A Machine Learning Specialist is preparing data for training on Amazon SageMaker. The Specialist is using one of the SageMaker built-in algorithms for the training. The dataset is stored in .CSV format and is transformed into a numpy.array, which appears to be negatively affecting the speed of the training.
What should the Specialist do to optimize the data for training on SageMaker?
Answer options
- A. Use the SageMaker batch transform feature to transform the training data into a DataFrame.
- B. Use AWS Glue to compress the data into the Apache Parquet format.
- C. Transform the dataset into the RecordIO protobuf format.
- D. Use the SageMaker hyperparameter optimization feature to automatically optimize the data.
Correct answer: C
Explanation
Transforming the dataset into the RecordIO protobuf format is optimal for training on SageMaker because it is specifically designed for efficient data input. The other options either do not address the format issue directly or focus on different aspects of data handling that do not enhance training speed.