A data engineer wants to perform exploratory data analysis (EDA) on a petabyte of data. T…

Question

A data engineer wants to perform exploratory data analysis (EDA) on a petabyte of data. The data engineer does not want to manage compute resources and wants to pay only for queries that are run. The data engineer must write the analysis by using Python from a Jupyter notebook. Which solution will meet these requirements?

Accepted Answer

Correct answer: A. A. Use Apache Spark from within Amazon Athena. — Amazon Athena provides a serverless Apache Spark environment that allows running interactive Python queries in Jupyter notebooks with no infrastructure to manage and a pricing model based on queries run. In contrast, Amazon EMR requires managing cluster instances, while Amazon SageMaker and Amazon Redshift integrations do not offer the same serverless, pay-per-query Spark notebook configuration.

AWS Certified Machine Learning – Specialty — Question 306

Answer options

Correct answer: A

Explanation