AWS Certified Big Data – Specialty — Question 47
A data engineer in a manufacturing company is designing a data processing platform that receives a large volume of unstructured data. The data engineer must populate a well-structured star schema in Amazon
Redshift.
What is the most efficient architecture strategy for this purpose?
Answer options
- A. Transform the unstructured data using Amazon EMR and generate CSV data. COPY the CSV data into the analysis schema within Redshift.
- B. Load the unstructured data into Redshift, and use string parsing functions to extract structured data for inserting into the analysis schema.
- C. When the data is saved to Amazon S3, use S3 Event Notifications and AWS Lambda to transform the file contents. Insert the data into the analysis schema on Redshift.
- D. Normalize the data using an AWS Marketplace ETL tool, persist the results to Amazon S3, and use AWS Lambda to INSERT the data into Redshift.
Correct answer: A
Explanation
Option A is the most efficient because using Amazon EMR to transform unstructured data into a structured CSV format allows for a smooth integration into Redshift using the COPY command, which is optimized for bulk loading. The other options involve more complex processes or real-time transformations, which can add latency and processing overhead, making them less efficient for this scenario.