AWS Certified Data Engineer – Associate (DEA-C01) — Question 117
A company is using Amazon Redshift to build a data warehouse solution. The company is loading hundreds of files into a fact table that is in a Redshift cluster.
The company wants the data warehouse solution to achieve the greatest possible throughput. The solution must use cluster resources optimally when the company loads data into the fact table.
Which solution will meet these requirements?
Answer options
- A. Use multiple COPY commands to load the data into the Redshift cluster.
- B. Use S3DistCp to load multiple files into Hadoop Distributed File System (HDFS). Use an HDFS connector to ingest the data into the Redshift cluster.
- C. Use a number of INSERT statements equal to the number of Redshift cluster nodes. Load the data in parallel into each node.
- D. Use a single COPY command to load the data into the Redshift cluster.
Correct answer: D
Explanation
The correct answer is D, as a single COPY command is optimized for high throughput in Redshift, allowing efficient data loading directly from S3. Options A and C involve multiple commands or INSERT statements, which are less efficient and can lead to bottlenecks. Option B uses HDFS, which adds unnecessary complexity and does not leverage Redshift's strengths in data loading.