A company wants to collect and process events data from different departments in near-rea…

Question

A company wants to collect and process events data from different departments in near-real time. Before storing the data in Amazon S3, the company needs to clean the data by standardizing the format of the address and timestamp columns. The data varies in size based on the overall load at each particular point in time. A single data record can be 100 KB-10 MB.
How should a data analytics specialist design the solution for data ingestion?

Accepted Answer

Correct answer: C. C. Use Amazon Managed Streaming for Apache Kafka. Configure a topic for the raw data. Use a Kafka producer to write data to the topic. Create an application on Amazon EC2 that reads data from the topic by using the Apache Kafka consumer API, cleanses the data, and writes to Amazon S3. — The correct answer is C because using Amazon Managed Streaming for Apache Kafka allows for efficient handling of varying data sizes and provides a robust way to cleanse and process data before storing it in Amazon S3. Options A and B are more suited for real-time analytics rather than batch processing, while option D does not provide the necessary data cleansing capabilities at scale.

AWS Certified Data Analytics – Specialty — Question 86

Answer options

Correct answer: C

Explanation