Databricks Certified Machine Learning Professional — Question 8

A machine learning engineering team has written predictions computed in a batch job to a Delta table for querying. However, the team has noticed that the querying is running slowly. The team has already tuned the size of the data files. Upon investigating, the team has concluded that the rows meeting the query condition are sparsely located throughout each of the data files.
Based on the scenario, which of the following optimization techniques could speed up the query by colocating similar records while considering values in multiple columns?

Answer options

Correct answer: A

Explanation

Z-Ordering is effective in improving query performance by colocating similar records, allowing for more efficient retrieval when multiple columns are involved. Other options such as Bin-packing and Data skipping do not specifically address the issue of distributing records based on multiple column values, while writing as a Parquet file and tuning file size may help but do not directly optimize the layout of the data for querying.