AWS Certified Data Engineer – Associate (DEA-C01) — Question 211

A company wants to build a dimension table in an Amazon S3 bucket. The bucket contains historical data that includes 10 million records. The historical data is 1 TB in size.

A data engineer needs a solution to update changes for up to 10,000 records in the base table every day.

Which solution will meet this requirement with the LOWEST runtime?

Answer options

Correct answer: D

Explanation

The correct answer is D because using Apache Hudi with Amazon EMR allows for efficient handling of updates and optimizations that are tailored for large datasets, leading to lower runtime. The other options, while utilizing Spark or Pandas, do not provide the same level of efficiency and performance for updating the base table, especially with such significant historical data.