A company wants to research user turnover by analyzing the past 3 months of user activiti…

Question

A company wants to research user turnover by analyzing the past 3 months of user activities. With millions of users, 1.5 TB of uncompressed data is generated each day. A 30-node Amazon Redshift cluster with 2.56 TB of solid state drive (SSD) storage for each node is required to meet the query performance goals.
The company wants to run an additional analysis on a year's worth of historical data to examine trends indicating which features are most popular. This analysis will be done once a week.
What is the MOST cost-effective solution?

Accepted Answer

Correct answer: B. B. Keep the data from the last 90 days in Amazon Redshift. Move data older than 90 days to Amazon S3 and store it in Apache Parquet format partitioned by date. Then use Amazon Redshift Spectrum for the additional analysis. — Option B is the most cost-effective since it allows the company to keep the most recent data in Redshift for performance while offloading older data to S3, which is more economical for storage. Option A is costly as it requires increasing the cluster size significantly, while Option C incurs additional costs for maintaining an EMR cluster. Option D also leads to increased costs without addressing the storage efficiency needed for historical data.

AWS Certified Data Analytics – Specialty — Question 57

Answer options

Correct answer: B

Explanation