AWS Certified Data Engineer – Associate (DEA-C01) — Question 252
A banking company uses an application to collect large volumes of transactional data. The company uses Amazon Kinesis Data Streams for real-time analytics. The company’s application uses the PutRecord action to send data to Kinesis Data Streams.
A data engineer has observed network outages during certain times of day. The data engineer wants to configure exactly-once delivery for the entire processing pipeline.
Which solution will meet this requirement?
Answer options
- A. Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source.
- B. Update the checkpoint configuration of the Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) data collection application to avoid duplicate processing of events.
- C. Design the data source so events are not ingested into Kinesis Data Streams multiple times.
- D. Stop using Kinesis Data Streams. Use Amazon EMR instead. Use Apache Flink and Apache Spark Streaming in Amazon EMR.
Correct answer: A
Explanation
The correct answer is A because embedding a unique ID in each record allows the application to identify and discard duplicates, ensuring exactly-once delivery. Option B only addresses duplicate processing within the context of Apache Flink, not the entire pipeline. Option C suggests modifying the data source, which may not be feasible or effective in all scenarios. Option D proposes switching technologies, which does not guarantee exactly-once delivery and introduces unnecessary complexity.