AWS Certified Solutions Architect – Professional — Question 423
Your department creates regular analytics reports from your company's log files All log data is collected in Amazon S3 and processed by daily Amazon Elastic
MapReduce (EMR) jobs that generate daily PDF reports and aggregated tables in CSV format for an Amazon Redshift data warehouse.
Your CFO requests that you optimize the cost structure for this system.
Which of the following alternatives will lower costs without compromising average performance of the system or data integrity for the raw data?
Answer options
- A. Use reduced redundancy storage (RRS) for all data In S3. Use a combination of Spot Instances and Reserved Instances for Amazon EMR jobs. Use Reserved Instances for Amazon Redshift.
- B. Use reduced redundancy storage (RRS) for PDF and .csv data in S3. Add Spot Instances to EMR jobs. Use Spot Instances for Amazon Redshift.
- C. Use reduced redundancy storage (RRS) for PDF and .csv data In Amazon S3. Add Spot Instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift.
- D. Use reduced redundancy storage (RRS) for all data in Amazon S3. Add Spot Instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift.
Correct answer: C
Explanation
To maintain the data integrity of the critical raw logs, they must remain on standard Amazon S3 storage rather than reduced redundancy storage (RRS), whereas reproducible outputs like PDFs and CSVs can safely use RRS to save costs. Amazon Redshift does not support Spot Instances, making Reserved Instances the correct choice for cost-effective, steady-state data warehousing. Adding Spot Instances to Amazon EMR task nodes successfully lowers data processing costs without compromising the overall average performance of the system.