AWS Certified Solutions Architect – Associate (SAA-C02) — Question 298
A company hosts more than 300 global websites and applications. The company requires a platform to analyze more than 30 TB of clickstream data each day.
What should a solutions architect do to transmit and process the clickstream data?
Answer options
- A. Design an AWS Data Pipeline to archive the data to an Amazon S3 bucket and run an Amazon EMR cluster with the data to generate analytics.
- B. Create an Auto Scaling group of Amazon EC2 instances to process the data and send it to an Amazon S3 data lake for Amazon Redshift to use for analysis.
- C. Cache the data to Amazon CloudFront. Store the data in an Amazon S3 bucket. When an object is added to the S3 bucket, run an AWS Lambda function to process the data for analysis.
- D. Collect the data from Amazon Kinesis Data Streams. Use Amazon Kinesis Data Firehose to transmit the data to an Amazon S3 data lake. Load the data in Amazon Redshift for analysis.
Correct answer: D
Explanation
Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose are ideal for real-time ingestion and delivery of massive streaming datasets like daily clickstream data, which can then be loaded into Amazon Redshift for high-performance analytics. Other options like AWS Data Pipeline (Option A) or EC2 Auto Scaling (Option B) are not optimized for high-throughput, real-time streaming ingestion. Using AWS Lambda functions triggered by S3 uploads (Option C) would face performance and scaling limitations when processing 30 TB of daily clickstream data.