Databricks Certified Generative AI Engineer Associate — Question 46

A Generative AI Engineer has written scalable PySpark code to ingest unstructured PDF documents and chunk them in preparation for storing in a Databricks Vector Search index. Currently, the two columns of their dataframe include the original filename as a string and an array of text chunks from that document.

What set of steps should the Generative AI Engineer perform to store the chunks in a ready-to-ingest manner for Databricks Vector Search?

Answer options

Correct answer: B

Explanation

The correct answer B outlines the necessary steps to ensure each chunk is uniquely identifiable and compatible with Delta Lake's change data capture features. Options A and C do not adequately prepare the data for ingestion, while D, although close, does not mention enabling change feed, which is crucial for tracking changes in the output Delta table.