AWS Certified Data Engineer – Associate (DEA-C01) — Question 15

A company uses Amazon S3 to store semi-structured data in a transactional data lake. Some of the data files are small, but other data files are tens of terabytes.
A data engineer must perform a change data capture (CDC) operation to identify changed data from the data source. The data source sends a full snapshot as a JSON file every day and ingests the changed data into the data lake.
Which solution will capture the changed data MOST cost-effectively?

Answer options

Correct answer: C

Explanation

Using an open source data lake format to merge the data source with the S3 data lake is the most cost-effective solution, as it allows for efficient data handling without the overhead of additional services. The other options involve more complex setups and additional costs associated with services like AWS DMS, which may not be necessary for this scenario.