AWS Certified Machine Learning – Specialty — Question 306

A data engineer wants to perform exploratory data analysis (EDA) on a petabyte of data. The data engineer does not want to manage compute resources and wants to pay only for queries that are run. The data engineer must write the analysis by using Python from a Jupyter notebook.

Which solution will meet these requirements?

Answer options

Correct answer: A

Explanation

Amazon Athena provides a serverless Apache Spark environment that allows running interactive Python queries in Jupyter notebooks with no infrastructure to manage and a pricing model based on queries run. In contrast, Amazon EMR requires managing cluster instances, while Amazon SageMaker and Amazon Redshift integrations do not offer the same serverless, pay-per-query Spark notebook configuration.