A data engineer must ingest a source of structured data that is in .csv format into an Am…

Question

A data engineer must ingest a source of structured data that is in .csv format into an Amazon S3 data lake. The .csv files contain 15 columns. Data analysts need to run Amazon Athena queries on one or two columns of the dataset. The data analysts rarely query the entire file.
Which solution will meet these requirements MOST cost-effectively?

Accepted Answer

Correct answer: D. D. Create an AWS Glue extract, transform, and load (ETL) job to read from the .csv structured data source. Configure the job to write the data into the data lake in Apache Parquet format. — The correct answer is D because Apache Parquet format is columnar and optimizes storage costs and query performance for analytical workloads like those performed by Amazon Athena. Options A and B do not utilize a columnar format, leading to higher costs and less efficiency when querying specific columns. Option C, while using a format suitable for data processing, does not offer the same cost advantages as Parquet for the given use case.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 26

Answer options

Correct answer: D

Explanation