AWS Certified Solutions Architect – Associate (SAA-C03) — Question 634
A marketing company receives a large amount of new clickstream data in Amazon S3 from a marketing campaign. The company needs to analyze the clickstream data in Amazon S3 quickly. Then the company needs to determine whether to process the data further in the data pipeline.
Which solution will meet these requirements with the LEAST operational overhead?
Answer options
- A. Create external tables in a Spark catalog. Configure jobs in AWS Glue to query the data.
- B. Configure an AWS Glue crawler to crawl the data. Configure Amazon Athena to query the data.
- C. Create external tables in a Hive metastore. Configure Spark jobs in Amazon EMR to query the data.
- D. Configure an AWS Glue crawler to crawl the data. Configure Amazon Kinesis Data Analytics to use SQL to query the data.
Correct answer: B
Explanation
Amazon Athena is a serverless, interactive query service that allows for direct querying of Amazon S3 data using standard SQL without needing to manage infrastructure, resulting in the lowest operational overhead. Combining Athena with an AWS Glue crawler simplifies schema discovery and cataloging automatically. Solutions involving Amazon EMR, Spark jobs, or Amazon Kinesis Data Analytics introduce unnecessary architectural complexity and management overhead for simple ad-hoc data analysis.