A Generative Al Engineer has successfully ingested unstructured documents and chunked the…

Question

A Generative Al Engineer has successfully ingested unstructured documents and chunked them by document sections. They would like to store the chunks in a Vector Search index. The current format of the dataframe has two columns: (i) original document file name (ii) an array of text chunks for each document.
What is the most performant way to store this dataframe?

Accepted Answer

Correct answer: B. B. Flatten the dataframe to one chunk per row, create a unique identifier for each row, and save to a Delta table — Option B is the best choice because flattening the dataframe to one chunk per row allows for efficient indexing and searching in a Vector Search index. The other options either do not adequately prepare the data for indexing (C) or suggest less efficient storage methods (A and D) that could hinder performance.

Databricks Certified Generative AI Engineer Associate — Question 24

Answer options

Correct answer: B

Explanation