AWS Certified Machine Learning – Specialty — Question 352

A company is building a predictive maintenance system using real-time data from devices on remote sites. There is no AWS Direct Connect connection or VPN connection between the sites and the company's VPC. The data needs to be ingested in real time from the devices into Amazon S3.

Transformation is needed to convert the raw data into clean .csv data to be fed into the machine learning (ML) model. The transformation needs to happen during the ingestion process. When transformation fails, the records need to be stored in a specific location in Amazon S3 for human review. The raw data before transformation also needs to be stored in Amazon S3.

How should an ML specialist architect the solution to meet these requirements with the LEAST effort?

Answer options

Correct answer: A

Explanation

Amazon Data Firehose natively supports inline data transformation using AWS Lambda and can automatically back up raw source records to an S3 bucket before transformation. When transformations fail, Firehose automatically routes the failed records to an S3 error prefix location, fulfilling all requirements out-of-the-box. Alternatives involving Amazon MSK, ECS workers, or Kinesis Data Streams introduce unnecessary architectural complexity and significantly higher operational overhead.