AWS Certified Data Analytics – Specialty — Question 55
A banking company wants to collect large volumes of transactional data using Amazon Kinesis Data Streams for real-time analytics. The company uses
PutRecord to send data to Amazon Kinesis, and has observed network outages during certain times of the day. The company wants to obtain exactly once semantics for the entire processing pipeline.
What should the company do to obtain these characteristics?
Answer options
- A. Design the application so it can remove duplicates during processing be embedding a unique ID in each record.
- B. Rely on the processing semantics of Amazon Kinesis Data Analytics to avoid duplicate processing of events.
- C. Design the data producer so events are not ingested into Kinesis Data Streams multiple times.
- D. Rely on the exactly one processing semantics of Apache Flink and Apache Spark Streaming included in Amazon EMR.
Correct answer: A
Explanation
The correct answer is A because embedding a unique ID in each record allows the application to identify and remove duplicates during processing, ensuring exactly once semantics. Options B and D rely on external processing frameworks that may not guarantee exactly once semantics without the proper configuration, while option C does not address the situation of duplicate records already sent to Kinesis.