AWS Certified Machine Learning – Specialty — Question 50
A Machine Learning Specialist has completed a proof of concept for a company using a small data sample, and now the Specialist is ready to implement an end- to-end solution in AWS using Amazon SageMaker. The historical training data is stored in Amazon RDS.
Which approach should the Specialist use for training a model using that data?
Answer options
- A. Write a direct connection to the SQL database within the notebook and pull data in
- B. Push the data from Microsoft SQL Server to Amazon S3 using an AWS Data Pipeline and provide the S3 location within the notebook.
- C. Move the data to Amazon DynamoDB and set up a connection to DynamoDB within the notebook to pull data in.
- D. Move the data to Amazon ElastiCache using AWS DMS and set up a connection within the notebook to pull data in for fast access.
Correct answer: B
Explanation
The correct choice, B, is effective because moving the data to Amazon S3 allows for efficient access and integration with Amazon SageMaker. Option A is incorrect as direct SQL connections can introduce complexity and performance issues. Option C is not ideal since DynamoDB is not optimized for large-scale batch processing required for training. Option D is also unsuitable as ElastiCache is primarily for caching, not for persistent storage required for model training.