Databricks Certified Generative AI Engineer Associate — Question 24

A Generative Al Engineer has successfully ingested unstructured documents and chunked them by document sections. They would like to store the chunks in a Vector Search index. The current format of the dataframe has two columns: (i) original document file name (ii) an array of text chunks for each document.
What is the most performant way to store this dataframe?

Answer options

Correct answer: B

Explanation

Option B is the best choice because flattening the dataframe to one chunk per row allows for efficient indexing and searching in a Vector Search index. The other options either do not adequately prepare the data for indexing (C) or suggest less efficient storage methods (A and D) that could hinder performance.