A company is using a fleet of Amazon EC2 instances to ingest data from on-premises data s…

Question

A company is using a fleet of Amazon EC2 instances to ingest data from on-premises data sources. The data is in JSON format and ingestion rates can be as high as 1 MB/s. When an EC2 instance is rebooted, the data in-flight is lost. The company’s data science team wants to query ingested data in near-real time. Which solution provides near-real-time data querying that is scalable with minimal data loss?

Accepted Answer

Correct answer: A. A. Publish data to Amazon Kinesis Data Streams, Use Kinesis Data Analytics to query the data. — Publishing the data directly to Amazon Kinesis Data Streams ensures durability and prevents data loss during EC2 reboots, while Amazon Kinesis Data Analytics enables immediate, near-real-time SQL querying on the streaming data. Options B and C are incorrect because Amazon Kinesis Data Firehose buffers data, which introduces latency that does not meet the near-real-time requirement as effectively as Data Streams. Option D is incorrect because storing data on Amazon EBS volumes adds unnecessary management overhead, and ElastiCache for Redis is not designed for scalable, complex analytical querying by data science teams.

AWS Certified Solutions Architect – Associate (SAA-C03) — Question 301

Answer options

Correct answer: A

Explanation