AWS Certified Solutions Architect – Associate (SAA-C03) — Question 301
A company is using a fleet of Amazon EC2 instances to ingest data from on-premises data sources. The data is in JSON format and ingestion rates can be as high as 1 MB/s. When an EC2 instance is rebooted, the data in-flight is lost. The company’s data science team wants to query ingested data in near-real time.
Which solution provides near-real-time data querying that is scalable with minimal data loss?
Answer options
- A. Publish data to Amazon Kinesis Data Streams, Use Kinesis Data Analytics to query the data.
- B. Publish data to Amazon Kinesis Data Firehose with Amazon Redshift as the destination. Use Amazon Redshift to query the data.
- C. Store ingested data in an EC2 instance store. Publish data to Amazon Kinesis Data Firehose with Amazon S3 as the destination. Use Amazon Athena to query the data.
- D. Store ingested data in an Amazon Elastic Block Store (Amazon EBS) volume. Publish data to Amazon ElastiCache for Redis. Subscribe to the Redis channel to query the data.
Correct answer: A
Explanation
Publishing the data directly to Amazon Kinesis Data Streams ensures durability and prevents data loss during EC2 reboots, while Amazon Kinesis Data Analytics enables immediate, near-real-time SQL querying on the streaming data. Options B and C are incorrect because Amazon Kinesis Data Firehose buffers data, which introduces latency that does not meet the near-real-time requirement as effectively as Data Streams. Option D is incorrect because storing data on Amazon EBS volumes adds unnecessary management overhead, and ElastiCache for Redis is not designed for scalable, complex analytical querying by data science teams.