AWS Certified Data Analytics – Specialty — Question 16
A retail company is building its data warehouse solution using Amazon Redshift. As a part of that effort, the company is loading hundreds of files into the fact table created in its Amazon Redshift cluster. The company wants the solution to achieve the highest throughput and optimally use cluster resources when loading data into the company's fact table.
How should the company meet these requirements?
Answer options
- A. Use multiple COPY commands to load the data into the Amazon Redshift cluster.
- B. Use S3DistCp to load multiple files into the Hadoop Distributed File System (HDFS) and use an HDFS connector to ingest the data into the Amazon Redshift cluster.
- C. Use LOAD commands equal to the number of Amazon Redshift cluster nodes and load the data in parallel into each node.
- D. Use a single COPY command to load the data into the Amazon Redshift cluster.
Correct answer: D
Explanation
The correct answer is D because using a single COPY command allows Amazon Redshift to optimize the data loading process, achieving the best throughput and resource usage. Options A and C suggest using multiple commands or parallel loading per node, which can lead to inefficiencies. Option B introduces unnecessary complexity by involving HDFS, which is not needed when working directly with Amazon Redshift.