AWS Certified Solutions Architect – Professional — Question 792
A company is developing a gene reporting device that will collect genomic information to assist researchers will collecting large samples of data from a diverse population. The device will push 8 KB of genomic data every second to a data platform that will need to process and analyze the data and provide information back to researchers. The data platform must meet the following requirements:
✑ Provide near-real-time analytics of the inbound genomic data
✑ Ensure the data is flexible, parallel, and durable
✑ Deliver results of processing to a data warehouse
Which strategy should a solutions architect use to meet these requirements?
Answer options
- A. Use Amazon Kinesis Data Firehouse to collect the inbound sensor data, analyze the data with Kinesis clients, and save the results to an Amazon RDS instance.
- B. Use Amazon Kinesis Data Streams to collect the inbound sensor data, analyze the data with Kinesis clients, and save the results to an Amazon Redshift cluster using Amazon EMR.
- C. Use Amazon S3 to collect the inbound device data, analyze the data from Amazon SQS with Kinesis, and save the results to an Amazon Redshift cluster.
- D. Use an Amazon API Gateway to put requests into an Amazon SQS queue, analyze the data with an AWS Lambda function, and save the results to an Amazon Redshift cluster using Amazon EMR.
Correct answer: B
Explanation
Amazon Kinesis Data Streams provides the necessary real-time, durable, and parallel processing capabilities required for high-frequency streaming data ingestion. Processing the stream with Kinesis clients and loading the results into Amazon Redshift (a dedicated data warehouse) via Amazon EMR perfectly meets all performance, parallelism, and destination requirements. Other options either use non-real-time components like Amazon S3 as the initial ingest point or target non-data warehouse destinations like Amazon RDS.