AWS Certified Big Data – Specialty — Question 16
A media advertising company handles a large number of real-time messages sourced from over 200 websites.
The companys data engineer needs to collect and process records in real time for analysis using Spark
Streaming on Amazon Elastic MapReduce (EMR). The data engineer needs to fulfill a corporate mandate to keep ALL raw messages as they are received as a top priority.
Which Amazon Kinesis configuration meets these requirements?
Answer options
- A. Publish messages to Amazon Kinesis Firehose backed by Amazon Simple Storage Service (S3). Pull messages off Firehose with Spark Streaming in parallel to persistence to Amazon S3.
- B. Publish messages to Amazon Kinesis Streams. Pull messages off Streams with Spark Streaming in parallel to AWS Lambda pushing messages from Streams to Firehose backed by Amazon Simple Storage Service (S3).
- C. Publish messages to Amazon Kinesis Firehose backed by Amazon Simple Storage Service (S3). Use AWS Lambda to pull messages from Firehose to Streams for processing with Spark Streaming.
- D. Publish messages to Amazon Kinesis Streams, pull messages off with Spark Streaming, and write row data to Amazon Simple Storage Service (S3) before and after processing.
Correct answer: C
Explanation
Option C is correct because it ensures that all raw messages are captured in Kinesis Firehose, which is backed by S3, while AWS Lambda facilitates the transfer of messages to Streams for processing with Spark Streaming. Options A and B do not guarantee that all raw messages are stored as they arrive, and option D, while it processes the messages correctly, does not utilize AWS Lambda for efficient handling of the data flow.