AWS Certified Data Analytics – Specialty — Question 126
A data analyst is designing a solution to interactively query datasets with SQL using a JDBC connection. Users will join data stored in Amazon S3 in Apache ORC format with data stored in Amazon OpenSearch Service (Amazon Elasticsearch Service) and Amazon Aurora MySQL.
Which solution will provide the MOST up-to-date results?
Answer options
- A. Use AWS Glue jobs to ETL data from Amazon ES and Aurora MySQL to Amazon S3. Query the data with Amazon Athena.
- B. Use Amazon DMS to stream data from Amazon ES and Aurora MySQL to Amazon Redshift. Query the data with Amazon Redshift.
- C. Query all the datasets in place with Apache Spark SQL running on an AWS Glue developer endpoint.
- D. Query all the datasets in place with Apache Presto running on Amazon EMR.
Correct answer: D
Explanation
Option D is correct because Apache Presto allows querying data from multiple sources in real-time without moving it, ensuring up-to-date results. Options A and B involve moving data to other services, which can introduce latency and may not reflect the most current state. Option C, while capable, does not leverage the optimized querying capabilities of Presto for multiple data sources.