AWS Certified Solutions Architect – Professional — Question 213
A company has several teams, and each team has their own Amazon RDS database that totals 100 TB. The company is building a data query platform for
Business Intelligence Analysts to generate a weekly business report. The new system must run ad-hoc SQL queries.
What is the MOST cost-effective solution?
Answer options
- A. Create a new Amazon Redshift cluster. Create an AWS Glue ETL job to copy data from the RDS databases to the Amazon Redshift cluster. Use Amazon Redshift to run the query.
- B. Create an Amazon EMR cluster with enough core nodes. Run an Apache Spark job to copy data from the RDS databases to a Hadoop Distributed File System (HDFS). Use a local Apache Hive metastore to maintain the table definition. Use Spark SQL to run the query.
- C. Use an AWS Glue ETL job to copy all the RDS databases to a single Amazon Aurora PostgreSQL database. Run SQL queries on the Aurora PostgreSQL database.
- D. Use an AWS Glue crawler to crawl all the databases and create tables in the AWS Glue Data Catalog. Use an AWS Glue ETL job to load data from the RDS databases to Amazon S3, and use Amazon Athena to run the queries.
Correct answer: D
Explanation
Option D is the most cost-effective solution as it leverages AWS Glue and Amazon Athena, which are serverless and charge only for the queries run, minimizing costs. Options A and B involve setting up additional services that incur higher costs, while option C consolidates data into Aurora, which may also be more expensive compared to using S3 and Athena for querying.