AWS Certified Data Engineer – Associate (DEA-C01) — Question 204
A company wants to ingest streaming data into an Amazon Redshift data warehouse from an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster. A data engineer needs to develop a solution that provides low data access time and that optimizes storage costs.
Which solution will meet these requirements with the LEAST operational overhead?
Answer options
- A. Create an external schema that maps to the MSK cluster. Create a materialized view that references the external schema to consume the streaming data from the MSK topic.
- B. Develop an AWS Glue streaming extract, transform, and load (ETL) job to process the incoming data from Amazon MSK. Load the data into Amazon S3. Use Amazon Redshift Spectrum to read the data from Amazon S3.
- C. Create an external schema that maps to the streaming data source. Create a new Amazon Redshift table that references the external schema.
- D. Create an Amazon S3 bucket. Ingest the data from Amazon MSK. Create an event-driven AWS Lambda function to load the data from the S3 bucket to a new Amazon Redshift table.
Correct answer: A
Explanation
Option A is correct because it allows for direct access to the streaming data with minimal operational management by using an external schema and materialized view. Options B and D involve additional steps and services like AWS Glue and Lambda, which increase complexity and operational overhead. Option C does not provide the same level of efficiency in managing streaming data as option A does.