AWS Certified Machine Learning – Specialty — Question 199
A company has a podcast platform that has thousands of users. The company has implemented an anomaly detection algorithm to detect low podcast engagement based on a 10-minute running window of user events such as listening, pausing, and exiting the podcast. A machine learning (ML) specialist is designing the data ingestion of these events with the knowledge that the event payload needs some small transformations before inference.
How should the ML specialist design the data ingestion to meet these requirements with the LEAST operational overhead?
Answer options
- A. Ingest event data by using a GraphQLAPI in AWS AppSync. Store the data in an Amazon DynamoDB table. Use DynamoDB Streams to call an AWS Lambda function to transform the most recent 10 minutes of data before inference.
- B. Ingest event data by using Amazon Kinesis Data Streams. Store the data in Amazon S3 by using Amazon Kinesis Data Firehose. Use AWS Glue to transform the most recent 10 minutes of data before inference.
- C. Ingest event data by using Amazon Kinesis Data Streams. Use an Amazon Kinesis Data Analytics for Apache Flink application to transform the most recent 10 minutes of data before inference.
- D. Ingest event data by using Amazon Managed Streaming for Apache Kafka (Amazon MSK). Use an AWS Lambda function to transform the most recent 10 minutes of data before inference.
Correct answer: C
Explanation
The correct answer is C because using Amazon Kinesis Data Streams along with Amazon Kinesis Data Analytics for Apache Flink allows for real-time processing of streaming data with minimal operational overhead. Options A and D involve additional layers of AWS services that may complicate the ingestion process, while option B requires data to be stored in S3, which adds latency and is not as efficient for real-time anomaly detection.