AWS Certified Data Engineer – Associate (DEA-C01) — Question 245

A company stores Apache Parquet files in an Amazon S3 data lake. The data lake receives thousands of files from multiple sources every hour. The files range in size from 50 KB to 100 KB.

The company is evaluating the implementation of Apache Iceberg tables for the data lake. The company is using AWS Glue Data Catalog as part of the evaluation. The company needs a solution to optimize query performance in Iceberg. The solution must ensure that Iceberg table performance does not degrade when more files are added over time.

Which solution will meet these requirements?

Answer options

Correct answer: C

Explanation

Option C is correct because configuring Iceberg table properties for automatic compaction based on thresholds helps maintain optimal performance as files are added. Option A involves daily compaction which may not be timely enough, and Option B's frequent compaction could lead to unnecessary overhead. Option D focuses on partitioning but does not address the critical need for compaction, which is essential for performance optimization.