AWS Certified Data Engineer – Associate (DEA-C01) — Question 111
A company plans to use Amazon Kinesis Data Firehose to store data in Amazon S3. The source data consists of 2 MB .csv files. The company must convert the .csv files to JSON format. The company must store the files in Apache Parquet format.
Which solution will meet these requirements with the LEAST development effort?
Answer options
- A. Use Kinesis Data Firehose to convert the .csv files to JSON. Use an AWS Lambda function to store the files in Parquet format.
- B. Use Kinesis Data Firehose to convert the .csv files to JSON and to store the files in Parquet format.
- C. Use Kinesis Data Firehose to invoke an AWS Lambda function that transforms the .csv files to JSON and stores the files in Parquet format.
- D. Use Kinesis Data Firehose to invoke an AWS Lambda function that transforms the .csv files to JSON. Use Kinesis Data Firehose to store the files in Parquet format.
Correct answer: D
Explanation
Option D is correct because it efficiently separates the transformation and storage processes, allowing Kinesis Data Firehose to handle the storage of files in Parquet format after the AWS Lambda function converts them to JSON. Option A requires a separate Lambda function for storage, which adds unnecessary complexity. Option B incorrectly assumes Kinesis Data Firehose can directly convert to Parquet without a Lambda function. Option C complicates the process by combining JSON transformation and Parquet storage in an additional Lambda invocation.