AWS Certified Data Analytics – Specialty — Question 12
A company is planning to create a data lake in Amazon S3. The company wants to create tiered storage based on access patterns and cost objectives. The solution must include support for JDBC connections from legacy clients, metadata management that allows federation for access control, and batch-based ETL using PySpark and Scala. Operational management should be limited.
Which combination of components can meet these requirements? (Choose three.)
Answer options
- A. AWS Glue Data Catalog for metadata management
- B. Amazon EMR with Apache Spark for ETL
- C. AWS Glue for Scala-based ETL
- D. Amazon EMR with Apache Hive for JDBC clients
- E. Amazon Athena for querying data in Amazon S3 using JDBC drivers
- F. Amazon EMR with Apache Hive, using an Amazon RDS with MySQL-compatible backed metastore
Correct answer: A, C, E
Explanation
The correct answers are A, C, and E. AWS Glue Data Catalog provides efficient metadata management, AWS Glue for Scala enables ETL processes, and Amazon Athena allows querying of data in S3 while supporting JDBC connections. Options B and D do not align with the requirement for minimal operational management, while option F introduces additional complexity with a MySQL-compatible metastore.