SnowPro Advanced: Data Engineer — Question 107
A table, TABLE SALES, is reloaded daily, and is used by many users to support multiple data reporting tools. Users report poor performance when querying the table.
This is the clustering information for TABLE SALES:
{
"cluster_by_keys" : "LINEAR(SALES_DT)",
"total_partition_count" : 10798,
"total_constant_partition_count" : 0,
"average_overlaps" : 10797.0,
"average_depth" : 10798.0,
"partition_depth__histogram" : {
"00000" : 0,
"00001" : 0,
"00002" : 0,
"00003" : 0,
"00004" : 0,
"00005" : 0,
"00006" : 0,
"00007" : 0,
"00008" : 0,
"00009" : 0,
"00010" : 0,
"00011" : 0,
"00012" : 0,
"00013" : 0,
"00014" : 0,
"00015" : 0,
"00016" : 0,
"16384" : 10798
},
"clustering_errors" : [ ]
}
What step will improve the performance of the micro-partitions and OPTIMIZE query performance?
Answer options
- A. Incorporate additional pruning options when the table is being accessed by reporting queries.
- B. Identify and purge duplicate data in the table.
- C. Use the ORDER BY SALES_DT parameter when inserting the data into the table.
- D. Perform a one-time insert overwrite with the ORDER BY SALES_DT parameter.
Correct answer: D
Explanation
The correct answer is D because performing a one-time insert overwrite with the ORDER BY SALES_DT will reorganize the data in a way that optimizes the layout of micro-partitions, leading to better query performance. The other options, while potentially helpful, do not directly address the restructuring of data within the micro-partitions to improve efficiency.