AWS Certified Machine Learning – Specialty — Question 221

A company is building a pipeline that periodically retrains its machine learning (ML) models by using new streaming data from devices. The company's data engineering team wants to build a data ingestion system that has high throughput, durable storage, and scalability. The company can tolerate up to 5 minutes of latency for data ingestion. The company needs a solution that can apply basic data transformation during the ingestion process.

Which solution will meet these requirements with the MOST operational efficiency?

Answer options

Correct answer: A

Explanation

Option A is the most efficient as it utilizes Amazon Kinesis for real-time streaming and AWS Lambda for on-the-fly data transformation, efficiently meeting high throughput and scalability needs. Options B and C involve additional steps that increase latency and complexity by routing data through Amazon S3, which does not align with the requirement for low latency. Option D, while effective, does not leverage the Kinesis data stream mechanism as effectively as option A.