AWS Certified Machine Learning – Specialty — Question 138

A machine learning specialist stores IoT soil sensor data in Amazon DynamoDB table and stores weather event data as JSON files in Amazon S3. The dataset in
DynamoDB is 10 GB in size and the dataset in Amazon S3 is 5 GB in size. The specialist wants to train a model on this data to help predict soil moisture levels as a function of weather events using Amazon SageMaker.
Which solution will accomplish the necessary transformation to train the Amazon SageMaker model with the LEAST amount of administrative overhead?

Answer options

Correct answer: D

Explanation

Option D is correct because using AWS Glue crawlers simplifies the data discovery process, and the ETL job can efficiently merge the datasets and output them in a format suitable for Amazon SageMaker with minimal management. Options A and B involve more complex setups that require additional configurations and management, while option C focuses on streaming data rather than preparing the datasets needed for the model training.