AWS Certified Solutions Architect – Professional (SAP-C02) — Question 270
A company is developing a gene reporting device that will collect genomic information to assist researchers with collecting large samples of data from a diverse population. The device will push 8 KB of genomic data every second to a data platform that will need to process and analyze the data and provide information back to researchers. The data platform must meet the following requirements:
• Provide near-real-time analytics of the inbound genomic data
• Ensure the data is flexible, parallel, and durable
• Deliver results of processing to a data warehouse
Which strategy should a solutions architect use to meet these requirements?
Answer options
- A. Use Amazon Kinesis Data Firehose to collect the inbound sensor data, analyze the data with Kinesis clients, and save the results to an Amazon RDS instance.
- B. Use Amazon Kinesis Data Streams to collect the inbound sensor data, analyze the data with Kinesis clients, and save the results to an Amazon Redshift cluster using Amazon EMR.
- C. Use Amazon S3 to collect the inbound device data, analyze the data from Amazon SQS with Kinesis, and save the results to an Amazon Redshift cluster.
- D. Use an Amazon API Gateway to put requests into an Amazon SQS queue, analyze the data with an AWS Lambda function, and save the results to an Amazon Redshift cluster using Amazon EMR.
Correct answer: B
Explanation
Amazon Kinesis Data Streams is designed for ingestion of rapid, continuous data streams, offering the durability, parallelism, and flexibility required for near-real-time processing. Processing this stream with Kinesis client libraries and utilizing Amazon EMR to load the data into Amazon Redshift perfectly fulfills the data warehouse requirement. Other options are incorrect because Amazon RDS is not a data warehouse, and S3 or SQS-based ingestion patterns do not natively support the parallel, real-time analytics requirements as effectively as Kinesis Data Streams.