A Data Science team is designing a dataset repository where it will store a large amount…

Question

A Data Science team is designing a dataset repository where it will store a large amount of training data commonly used in its machine learning models. As Data
Scientists may create an arbitrary number of new datasets every day, the solution has to scale automatically and be cost-effective. Also, it must be possible to explore the data using SQL.
Which storage scheme is MOST adapted to this scenario?

Accepted Answer

Correct answer: A. A. Store datasets as files in Amazon S3. — The correct answer is A because Amazon S3 is designed for scalability and cost-effectiveness, making it ideal for storing large datasets. Options B and C are less suitable due to their limitations in scaling and cost, while D is more appropriate for structured data but does not offer SQL exploration capabilities like Amazon S3 does.

AWS Certified Machine Learning – Specialty — Question 35

Answer options

Correct answer: A

Explanation