AWS Certified Big Data – Specialty — Question 33
An organization needs to design and deploy a large-scale data storage solution that will be highly durable and highly flexible with respect to the type and structure of data being stored. The data to be stored will be sent or generated from a variety of sources and must be persistently available for access and processing by multiple applications.
What is the most cost-effective technique to meet these requirements?
Answer options
- A. Use Amazon Simple Storage Service (S3) as the actual data storage system, coupled with appropriate tools for ingestion/acquisition of data and for subsequent processing and querying.
- B. Deploy a long-running Amazon Elastic MapReduce (EMR) cluster with Amazon Elastic Block Store (EBS) volumes for persistent HDFS storage and appropriate Hadoop ecosystem tools for processing and querying.
- C. Use Amazon Redshift with data replication to Amazon Simple Storage Service (S3) for comprehensive durable data storage, processing, and querying.
- D. Launch an Amazon Relational Database Service (RDS), and use the enterprise grade and capacity of the Amazon Aurora engine for storage, processing, and querying.
Correct answer: C
Explanation
The correct answer is C because Amazon Redshift provides a scalable data warehouse solution that can efficiently handle large volumes of data with durability and flexibility, making it ideal for the requirements stated. Options A and B focus on different architectures that may not be as cost-effective or suitable for the specific needs of persistent access and multi-application processing. Option D, while powerful, may not be the most economical choice for large-scale data storage compared to Redshift's capabilities.