AWS Certified Data Engineer – Associate (DEA-C01) — Question 110

A company uses Amazon S3 as a data lake. The company sets up a data warehouse by using a multi-node Amazon Redshift cluster. The company organizes the data files in the data lake based on the data source of each data file.

The company loads all the data files into one table in the Redshift cluster by using a separate COPY command for each data file location. This approach takes a long time to load all the data files into the table. The company must increase the speed of the data ingestion. The company does not want to increase the cost of the process.

Which solution will meet these requirements?

Answer options

Correct answer: D

Explanation

The correct answer is D because creating a manifest file allows the COPY command to load multiple data files in parallel, significantly speeding up the ingestion process. Options A and C do not address the parallel loading requirement effectively, while option B involves an additional service (Amazon Aurora) that does not meet the cost constraints and adds unnecessary complexity.