AWS Certified Big Data – Specialty — Question 35

An organization is developing a mobile social application and needs to collect logs from all devices on which it is installed. The organization is evaluating the
Amazon Kinesis Data Streams to push logs and Amazon EMR to process data. They want to store data on HDFS using the default replication factor to replicate data among the cluster, but they are concerned about the durability of the data. Currently, they are producing 300 GB of raw data daily, with additional spikes during special events. They will need to scale out the Amazon EMR cluster to match the increase in streamed data.
Which solution prevents data loss and matches compute demand?

Answer options

Correct answer: D

Explanation

Option D is correct because using Amazon Kinesis Data Firehose allows for real-time streaming of logs directly into Amazon Elasticsearch Service, ensuring high durability and availability of the data. The other options either rely on storage solutions that may not provide the same level of durability or require additional management that could complicate scalability during spikes in data volume.