AWS Certified Machine Learning – Specialty — Question 197

A data scientist is working on a model to predict a company's required inventory stock levels. All historical data is stored in .csv files in the company's data lake on Amazon S3. The dataset consists of approximately 500 GB of data The data scientist wants to use SQL to explore the data before training the model. The company wants to minimize costs.

Which option meets these requirements with the LEAST operational overhead?

Answer options

Correct answer: B

Explanation

The correct answer is B because using AWS Glue to crawl the S3 bucket and create tables in the AWS Glue Data Catalog allows for a serverless approach with minimal operational overhead while leveraging Amazon Athena for querying. Options A, C, and D involve more complex setups and higher costs due to the need for clusters and additional management, making them less suitable for cost minimization.